AI-Powered Financial Analysis

JJF - Salzbach Capital

10. März 2025

The role of financial analysts has always been to uncover hidden insights in financial statements and exploit market inefficiencies. But what if an AI could match or even surpass them?

A recent study put ChatGPT to the test, challenging it to analyze financial statements and predict future earnings - without any narrative or contextual information. The results were surprising: ChatGPT not only held its own but even outperformed human analysts in certain scenarios. Could this signal a shift in the financial industry, where AI takes on a more central role in decision-making?

This article dives into the data, methodologies, and findings of "Financial Statement Analysis with Large Language Models" by Kim, Muhn, and Nikolaev (2024) and explores its implications for the future of financial analysis.

It is important to note that all data, methods, and findings are based on the work of Kim et al. (2024), as referenced at the end of this post. My contribution is merely to highlight the paper, summarize its results, and share my opinion on it.

The study relies on two key data sources:

Compustat (1968–2021) to ensure a robust test, the researchers compiled 150,678 firm-year observations from 15,401 unique firms in the Compustat database. This dataset includes:
- Balance sheets (two years) and income statements (three years)
- No company names or dates, ensuring anonymity and preventing AI from leveraging prior knowledge
- Firms with at least $1 million in total assets and a stock price above $1 per share at year-end
- Only firms with a December 31 fiscal year-end for consistency
By focusing solely on numerical financial data, the study effectively removed industry-specific narratives and managerial commentary, making it a pure test of financial statement analysis capabilities.
I/B/E/S (1983–2021) – The Analyst Benchmark to compare ChatGPT’s predictions against human analysts, the study used 39,533 firm-year observations from the I/B/E/S database. These records capture:
- Consensus analyst forecasts made one month, three months, and six months after a company’s earnings release
- Predictions based on earnings estimates from at least three different analysts per firm-year

One of the most innovative aspects of this study was the use of Chain-of-Thought (CoT) prompting. Unlike a simple request to "analyze financial statements", the CoT method guided ChatGPT step-by-step through a logical reasoning process, similar to how a professional analyst would approach the task.

The process included:

Identifying Key Financial Trends: The model examined changes in critical financial statement line items over time.
Computing Financial Ratios: Key metrics like operating efficiency, liquidity, and leverage ratios were calculated.
Generating Narrative Insights: The model interpreted these ratios in a human-like manner, explaining their implications.
Making a Final Prediction: ChatGPT predicted whether earnings would increase or decrease, providing a confidence score.

The evaluation of the prediction method was based on two common metrics: accuracy and F1-score.

Accuracy measured the percentage of correctly predicted earnings changes.
F1-score was the harmonic mean of precision (proportion of true positive predictions among all positive predictions) and recall (proportion of true positive predictions among all actual positives).

When employing the CoT approach, ChatGPT surpassed human analysts in forecasting the direction of future earnings. While financial analysts typically achieved an accuracy of around 53% to 57%, ChatGPT with CoT reached an impressive 60%.

ChatGPT also excelled in scenarios where human analysts often struggle, such as predicting earnings for small firms and loss-making companies. While its analytical capabilities outperformed in these complex cases, human analysts still held an advantage through access to qualitative insights and industry-specific knowledge that ChatGPT lacks.

Interestingly, the study also tested whether ChatGPT could infer firm identities based on financial statements. The result? ChatGPT was only able to identify firms 0.07% of the time - confirming that it was not relying on prior knowledge but instead making predictions purely based on numerical data.

I highly recommend that everyone read the paper themselves and form their own opinion on the results. In conclusion, ChatGPT demonstrates a remarkable ability to analyze financial statements and predict future earnings changes, outperforming human analysts in structured numerical data processing and generating insightful narratives.

While it cannot fully replace human analysts due to its lack of contextual knowledge and economic intuition, it can complement them by reducing cognitive biases and enhancing decision-making.

Additionally, the way prompts are designed significantly impacts performance. As seen in this study, a well-structured Chain-of-Thought approach improved prediction accuracy by approximately 8%.

I’d love to hear your thoughts on this paper and the future of AI in financial analysis.

JJF

Source:

Kim, A. G., Muhn, M. and Nikolaev, V. V. (2024). Financial Statement Analysis with Large Language Models. SSRN Electronic Journal. http://dx.doi.org/10.2139/ssrn.4835311.

Salzbach Capital

AI-Powered Financial Analysis

SUBSCRIBE NEWSLETTER

SUBSCRIBE
NEWSLETTER