Accurate earnings forecasts are crucial for successful investment outcomes, especially in emerging markets like Poland, where analyst coverage is limited. This study investigated the applicability of natural language processing (NLP) techniques, specifically FastText and FinBERT word embeddings, combined with a gradient-boosting decision tree (XGBoost) machine learning algorithm, to forecast earnings per share (EPS) for companies listed on the Warsaw Stock Exchange from 2010 to 2019. The performance of these models was compared with a seasonal random walk (SRW) model. The SRW model consistently demonstrated the lowest error rates, as measured by the mean arctangent absolute percentage error, and outperformed the NLP-based models across different periods and error metrics. The superior performance of the simple SRW model can be attributed to the overparameterization and overfitting tendencies of the complex NLP models, as well as the relatively straightforward dynamics of the Polish stock market. The findings suggest that the application of sophisticated NLP techniques for EPS forecasting in Poland may not be justified, and that the SRW model provides a more accurate representation of the market's behavior.
Citation: Wojciech Kurylek. Are Natural Language Processing methods applicable to EPS forecasting in Poland?[J]. Data Science in Finance and Economics, 2025, 5(1): 35-52. doi: 10.3934/DSFE.2025003
Accurate earnings forecasts are crucial for successful investment outcomes, especially in emerging markets like Poland, where analyst coverage is limited. This study investigated the applicability of natural language processing (NLP) techniques, specifically FastText and FinBERT word embeddings, combined with a gradient-boosting decision tree (XGBoost) machine learning algorithm, to forecast earnings per share (EPS) for companies listed on the Warsaw Stock Exchange from 2010 to 2019. The performance of these models was compared with a seasonal random walk (SRW) model. The SRW model consistently demonstrated the lowest error rates, as measured by the mean arctangent absolute percentage error, and outperformed the NLP-based models across different periods and error metrics. The superior performance of the simple SRW model can be attributed to the overparameterization and overfitting tendencies of the complex NLP models, as well as the relatively straightforward dynamics of the Polish stock market. The findings suggest that the application of sophisticated NLP techniques for EPS forecasting in Poland may not be justified, and that the SRW model provides a more accurate representation of the market's behavior.
| [1] |
Abarbanell J, Bushee B (1997) Fundamental analysis, future EPS, and stock prices. J Account Res 35: 1–24. https://doi.org/10.2307/2491464 doi: 10.2307/2491464
|
| [2] |
Armon A, Shwartz-Ziv R (2022) Tabular data: Deep learning is not all you need. Inform Fusion 81: 84–90. https://doi.org/10.1016/j.inffus.2021.11.011 doi: 10.1016/j.inffus.2021.11.011
|
| [3] |
Ball R, Ghysels E (2017) Automated earnings forecasts: Beat analysts or combine and conquer? Manage Sci 64: 4936–4952. https://doi.org/10.1287/mnsc.2017.2864 doi: 10.1287/mnsc.2017.2864
|
| [4] |
Ball R, Watts R (1972) Some Time Series Properties of Accounting Income. J Financ 27: 663–681. http://dx.doi.org/10.1111/j.1540-6261.1972.tb00991.x doi: 10.1111/j.1540-6261.1972.tb00991.x
|
| [5] | Banerjee P (2020) A Guide on XGBoost hyperparameters tuning, Accessed June 14, 2024. Available from: https://www.kaggle.com/code/prashant111/a-guide-on-xgboost-hyperparameters-tuning. |
| [6] | Bathke Jr AW, Lorek KS (1984) The Relationship between Time-Series Models and the Security Market's Expectation of Quarterly Earnings. Account Rev 59: 163–176. |
| [7] | Blomme S, Dedeyne J (2020) Predicting the effect of 10-K, 10-Q and 8-K company reports on abnormal stock returns using FinBERT NLP methods. Master thesis in Business Engineering: Data Analytics, Faculteti Economie en Bedrufskunde. University of Gent. |
| [8] |
Bv N, Simha JB, Abhi S (2023) Deploying NLP Techniques for Earnings Call Transcripts for Financial Analysis: A Reverse Phenomenon Paradigm. 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), I-SMAC 2023 – Proceedings: 368–375. https://doi.org/10.1109/I-SMAC58438.2023.10290494 doi: 10.1109/I-SMAC58438.2023.10290494
|
| [9] |
Borisov V, Haug J, Kasneci G, et al. (2024) Deep Neural Networks and Tabular Data: A Survey. Ieee T Neur Net Lear 35: 7499–7519. https://doi.org/10.1109/tnnls.2022.3229161 doi: 10.1109/tnnls.2022.3229161
|
| [10] |
Bradshaw M, Drake M, Myers J, et al. (2012). A re-examination of analysts' superiority over time-series forecasts of annual earnings. Rev Account Stud 17: 944–968. http://dx.doi.org/10.1007/s11142-012-9185-8 doi: 10.1007/s11142-012-9185-8
|
| [11] |
Brandon Ch, Jarrett JE, Khumawala SB, et al. (1987) A Comparative Study of the Forecasting Accuracy of Holt‐Winters and Economic Indicator Models of Earnings Per Share for Financial Decision Making. Manage Financ 13: 10–15. http://dx.doi.org/10.1108/eb013581 doi: 10.1108/eb013581
|
| [12] |
Brooks LD, Buckmaster DA (1976) Further Evidence of The Time Series Properties of Accounting Income. J Financ 31: 1359–1373. http://dx.doi.org/10.1111/j.1540-6261.1976.tb03218.x doi: 10.1111/j.1540-6261.1976.tb03218.x
|
| [13] |
Brown LD, Griffin PA, Hagerman RL, et al. (1987) Security analyst superiority relative to univariate time-series models in forecasting quarterly earnings. J Account Econ 9: 61–87. http://dx.doi.org/10.1016/0165-4101(87)90017-6 doi: 10.1016/0165-4101(87)90017-6
|
| [14] |
Brown LD, Rozeff MS (1979) Univariate Time-Series Models of Quarterly Accounting Earnings per Share: A Proposed Model. J Account Res 17: 179–189. http://dx.doi.org/10.2307/2490312 doi: 10.2307/2490312
|
| [15] | Cao Q, Gan Q (2009) Forecasting EPS of Chinese listed companies using a neural network with genetic algorithm. 15th Americas Conference on Information Systems 2009, AMCIS 2009: 2791–2981. |
| [16] |
Cao Q, Parry M (2009) Neural network earnings per share forecasting models: A comparison of backward propagation and the genetic algorithm. Decis Support Syst 47: 32–41. http://dx.doi.org/10.1016/j.dss.2008.12.011 doi: 10.1016/j.dss.2008.12.011
|
| [17] | Chang MW, Devlin J, Lee K, et al. (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Proceedings of the 2019. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2019). arXiv, arXiv: 1810.04805. |
| [18] |
Chen Y, Chen S, Huang H, et al. (2020) Applied identification of industry data science using an advanced multi-componential discretization model. Symmetry 12: 1–28. https://doi.org/10.3390/sym12101620 doi: 10.3390/sym12101620
|
| [19] |
Chen T, Guestrin C (2016) XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785 doi: 10.1145/2939672.2939785
|
| [20] |
Conroy R, Harris R (1987) Consensus Forecasts of Corporate Earnings: Analysts' Forecasts and Time Series Methods. Manage Sci 33: 725–738. http://dx.doi.org/10.1287/mnsc.33.6.725 doi: 10.1287/mnsc.33.6.725
|
| [21] |
Delen D, Kuzey C, Uyar A, et al. (2013) Measuring firm performance using financial ratios: A decision tree approach. Expert Syst Appl 40: 3970–3983. https://doi.org/10.1016/j.eswa.2013.01.012 doi: 10.1016/j.eswa.2013.01.012
|
| [22] | Dragan Ł, Wróblewska A (2019) Content-Based Recommendations in an E-Commerce Platform. Information Technology, Systems Research, and Computational Physics, Springer International Publishing, 252–263. https://doi.org/10.1007/978-3-030-18058-4_20 |
| [23] |
Dreher S, Eichfelder S, Noth F, et al. (2024) Does IFRS information on tax loss carryforwards and negative performance improve predictions of earnings and cash flows? J Bus Econ 94: 1–39. http://dx.doi.org/10.1007/s11573-023-01147-7 doi: 10.1007/s11573-023-01147-7
|
| [24] |
Elamir E (2020) Modeling and predicting earnings per share via regression tree approaches in banking sector: Middle East and North African countries case. Invest Manag Financ Innov 17: 51–68. https://doi.org/10.21511/imfi.17(2).2020.05 doi: 10.21511/imfi.17(2).2020.05
|
| [25] |
Elton EJ, Gruber MJ (1972) Earnings Estimates and the Accuracy of Expectational Data. Manage Sci 18: B409–B424. http://dx.doi.org/10.1287/mnsc.18.8.B409 doi: 10.1287/mnsc.18.8.B409
|
| [26] |
Etemadi H, Ahmadpour A, Moshashaei S, et al. (2015) Earnings Per Share Forecast Using Extracted Rules from Trained Neural Network by Genetic Algorithm. Computational Econ 46: 55–63. https://doi.org/10.1007/s10614-014-9455-6 doi: 10.1007/s10614-014-9455-6
|
| [27] |
Fisher IE, Garnsey MR, Hughes ME, et. al (2016) Natural Language Processing in Accounting, Auditing and Finance: A Synthesis of the Literature with a Roadmap for Future Research. Intell Syst Account 23: 157–214. Portico. https://doi.org/10.1002/isaf.1386 doi: 10.1002/isaf.1386
|
| [28] | Foster G (1977) Quarterly Accounting Data: Time-Series Properties and Predictive-Ability Results. Account Rev 52: 1–21. |
| [29] |
Frankel RM, Jennings JN, Lee JA, et al. (2017) Using Natural Language Processing to Assess Text Usefulness to Readers: The Case of Conference Calls and Earnings Prediction. SSRN Electronic J. https://doi.org/10.2139/ssrn.3095754 doi: 10.2139/ssrn.3095754
|
| [30] |
Gatsios RC, Lima FG, Gaio LE, et al. (2021) Re-examining analyst superiority in forecasting results of publicly-traded Brazilian companies. Revista de Administracao Mackenzie 22: eRAMF210164. https://doi.org/10.1590/1678-6971/eramf210164 doi: 10.1590/1678-6971/eramf210164
|
| [31] |
Gerakos J, Gramacy R (2013) Regression-Based Earnings Forecasts. Chicago Booth Res Paper, 12–26. https://doi.org/10.2139/ssrn.2112137 doi: 10.2139/ssrn.2112137
|
| [32] |
Griffin P (1977) The Time-Series Behavior of Quarterly Earnings: Preliminary Evidence. J Accounting Res 15: 71–83. http://dx.doi.org/10.2307/2490556 doi: 10.2307/2490556
|
| [33] |
Harris RDF, Wang P (2019) Model-based earnings forecasts vs. financial analysts' earnings forecasts. British Account Rev 51: 424–437. https://doi.org/10.1016/j.bar.2018.10.002 doi: 10.1016/j.bar.2018.10.002
|
| [34] |
Hou K, van Dijk M, Zhang Y, et al. (2012) The implied cost of capital: A new approach. J Account Econ 53: 504–526. https://doi.org/10.1016/j.jacceco.2011.12.001 doi: 10.1016/j.jacceco.2011.12.001
|
| [35] |
Huang AH, Wang H., Yang Y, et al. (2023) FinBERT: A Large Language Model for Extracting Information from Financial Text. Contemp Account Res 40: 806–841. Portico. https://doi.org/10.1111/1911-3846.12832 doi: 10.1111/1911-3846.12832
|
| [36] |
Huang D, Huang K, Liu Z, et al. (2020) FinBERT: A Pre-trained Financial Language Representation Model for Financial Text Mining. Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2020/622 doi: 10.24963/ijcai.2020/622
|
| [37] |
Ishikawa Y, Izumi K, Matsushima H, et al. (2020) Forecasting Net Income Estimate and Stock Price Using Text Mining from Economic Reports. Information 11: 1–21. https://doi.org/10.3390/info11060292 doi: 10.3390/info11060292
|
| [38] |
Jarrett JE (2008) Evaluating Methods for Forecasting Earnings Per Share. Manage Financ 16: 30–35. http://dx.doi.org/10.1108/eb013647 doi: 10.1108/eb013647
|
| [39] |
Johnson TE, Schmitt TG (1974) Effectiveness of Earnings Per Share Forecasts. Financ Manage 3: 64–72. http://dx.doi.org/10.2307/3665292 doi: 10.2307/3665292
|
| [40] |
Joulin A, Grave E, Bojanowski P, et al. (2016) Bag of Tricks for Efficient Text Classification. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, 2, Short Papers. https://doi.org/10.18653/v1/e17-2068 doi: 10.18653/v1/e17-2068
|
| [41] | Kambadura P, Manna G, Stentb A, et al. (2023) NLP in Finance, In: Capponi A and Lehalle Ch A, Machine Learning and Data Sciences for Financial Markets, Cambridge University Press, Cambridge. https://doi.org/10.1080/14697688.2023.2280101 |
| [42] |
Kim S, Kim H (2016) A new metric of absolute percentage error for intermittent demand forecasts. Int J Forecast 32: 669–679. http://dx.doi.org/10.1016/j.ijforecast.2015.12.003 doi: 10.1016/j.ijforecast.2015.12.003
|
| [43] |
Kiran JS, Jonnalagadda S, Naga Veera Tarun D, et al. (2023) Stock Market Prediction Using Sentiment Analysis and Incremental Clustering Approaches. 2023 9th International Conference on Advanced Computing and Communication Systems, ICACCS 2023: 888–893. https://doi.org/10.1109/ICACCS57279.2023.10112768 doi: 10.1109/ICACCS57279.2023.10112768
|
| [44] | Klimczak KM (2020) Text analysis in finance: The challenges for efficient application. In: Gąsiorkiewicz, L., & Monkiewicz, J. (Eds.) Innovation in Financial Services, 199–216. https://doi.org/10.4324/9781003051664-4 |
| [45] |
Kropiński P (2023). Investigating Whether Economic Policy Uncertainty Affects Central and Eastern European Markets. Evidence from Twitter-Based Uncertainty Measures. Available at SSRN 4359895. https://doi.org/10.2139/ssrn.4359895 doi: 10.2139/ssrn.4359895
|
| [46] |
Kuryłek W (2023a) The modeling of earnings per share of Polish companies for the post-financial crisis period using random walk and ARIMA models. J Bank Financ Econ 1: 26–43. http://dx.doi.org/10.7172/2353-6845.jbfe.2023.1.2 doi: 10.7172/2353-6845.jbfe.2023.1.2
|
| [47] | Kuryłek W (2023b) Can exponential smoothing do better than seasonal random walk for earnings per share forecasting in Poland? Bank Credit 54: 651–672. |
| [48] |
Kurylek W (2024) Can we profit from BigTechs' time series models in predicting earnings per share? Evidence from Poland. Data Sci Financ Econ 4: 218–235. http://dx.doi.org/10.3934/DSFE.2024008 doi: 10.3934/DSFE.2024008
|
| [49] | Lacina M, Lee B, Xu R, et al. (2011) An evaluation of financial analysts and naïve methods in forecasting long-term earnings. In: Lawrence K D, and Klimberg R K (Eds.), Advances in business and management forecasting, Bingley, UK, Emerald, 77–101. http://dx.doi.org/10.1108/S1477-4070(2011)0000008009 |
| [50] |
Lev B, Souginannis T (2010) The usefulness of accounting estimates for predicting cash flows and earnings. Rev Account Stud 15: 779–807. http://dx.doi.org/10.1007/s11142-009-9107-6 doi: 10.1007/s11142-009-9107-6
|
| [51] |
Lev B, Thiagarajan S (1993) Fundamental information analysis. J Account Res 31: 190–215. http://doi.org/10.2307/2491270 doi: 10.2307/2491270
|
| [52] |
Li KK (2011) How well do investors understand loss persistence? Rev Account Stud 16: 630–667. https://doi.org/10.1007/s11142-011-9157-4 doi: 10.1007/s11142-011-9157-4
|
| [53] |
Li KK, Mohanram P (2014) Evaluating cross-sectional forecasting models for the implied cost of capital. Rev Account Stud 19: 1152–1185. https://doi.org/10.1007/s11142-014-9282-y doi: 10.1007/s11142-014-9282-y
|
| [54] |
Lorek KS (1979) Predicting Annual Net Earnings with Quarterly Earnings Time-Series Models. J Account Res 17: 190–204. http://dx.doi.org/10.2307/2490313 doi: 10.2307/2490313
|
| [55] | Lorek KS, Willinger GL (1996) A multivariate time-series model for cash-flow data. Accoun Rev 71: 81–101. |
| [56] |
Łaniewski S, Ślepaczuk R (2024). Enhancing literature review with NLP methods Algorithmic investment strategies case. Faculty of Economic Studies, University of Warsaw Working Papers. https://doi.org/10.33138/2957-0506.2024.16.452 doi: 10.33138/2957-0506.2024.16.452
|
| [57] |
Medya S, Rasoolinejad M, Uzzi B, et. al (2022) An Exploratory Study of Stock Price Movements from Earnings Calls. WWW 2022 - Companion Proceedings of the Web Conference 2022: 20–31. https://doi.org/10.1145/3487553.3524205 doi: 10.1145/3487553.3524205
|
| [58] | Nabiee S (2020) Prediction of Firms' Annual and Quarterly Return Using NLP Techniques. Master thesis in Electrical Engineering, University of California, Irvine. |
| [59] |
Ohlson JA (1995) Earnings, Book Values, and Dividends in Equity Valuation. Contemp Account Res 11: 661–687. https://doi.org/10.1092/7tpj-rxqn-tqc7-ffae doi: 10.1092/7tpj-rxqn-tqc7-ffae
|
| [60] |
Ohlson JA (2001) Earnings, Book Values, and Dividends in Equity Valuation: An Empirical Perspective. Contemp Account Res 18: 107–120. https://doi.org/10.1092/7tpj-rxqn-tqc7-ffae doi: 10.1092/7tpj-rxqn-tqc7-ffae
|
| [61] |
Pagach DP, Warr RS (2020) Analysts versus time-series forecasts of quarterly earnings: A maintained hypothesis revisited. Adv Account 51: 1–15. http://dx.doi.org/10.1016/j.adiac.2020.100497 doi: 10.1016/j.adiac.2020.100497
|
| [62] |
Polak K (2021) The Impact of Investor Sentiment on Direction of Stock Price Changes: Evidence from the Polish Stock Market. J Bank Financ Econ 2: 72–90. https://doi.org/10.7172/2353-6845.jbfe.2021.2.4 doi: 10.7172/2353-6845.jbfe.2021.2.4
|
| [63] |
Pope PF, Wang P (2005) Earnings Components, Accounting Bias and Equity Valuation. Rev Account Stud 10: 387–407. https://doi.org/10.1007/s11142-005-4207-4 doi: 10.1007/s11142-005-4207-4
|
| [64] |
Pope P, Wang P (2014) On the relevance of earnings components: Valuation and forecasting links. Rev Quant Financ Account 42: 399–413. https://doi.org/10.1007/s11156-013-0347-y doi: 10.1007/s11156-013-0347-y
|
| [65] |
Rao J, Ramaraju V, Smith J, et al. (2022) A Sentiment Analysis Based Stock Recommendation System. Proceedings - 2022 IEEE 5th International Conference on Artificial Intelligence and Knowledge Engineering, AIKE 2022: 82–89. https://doi.org/10.1109/AIKE55402.2022.00020 doi: 10.1109/AIKE55402.2022.00020
|
| [66] |
Rawte V, Gupta A, Zaki MJ, et al. (2021) A Comparative Analysis of Temporal Long Text Similarity: Application to Financial Documents. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 77–91. https://doi.org/10.1007/978-3-030-66981-2_7 doi: 10.1007/978-3-030-66981-2_7
|
| [67] |
Ruland W (1980) On the Choice of Simple Extrapolative Model Forecasts of Annual Earnings. Financ Manage 9: 30–37. http://dx.doi.org/10.2307/3665165 doi: 10.2307/3665165
|
| [68] |
Rybinski K (2020) Should asset managers pay for economic research? A machine learning evaluation. J Financ Data Sci 6: 31–48. https://doi.org/10.1016/j.jfds.2020.08.001 doi: 10.1016/j.jfds.2020.08.001
|
| [69] |
Rybinski K (2021) Ranking professional forecasters by the predictive power of their narratives. Int J Forecast 37: 186–204. https://doi.org/10.1016/j.ijforecast.2020.04.003 doi: 10.1016/j.ijforecast.2020.04.003
|
| [70] |
Rybinski K (2023) Content still matters. A machine learning model for predicting news longevity from textual and context features. Inf Process Manage 60: 103398. https://doi.org/10.1016/j.ipm.2023.103398 doi: 10.1016/j.ipm.2023.103398
|
| [71] | Santana García F (2023) The effect of financial news on stock prices: insights from NLP techniques. Comillas Pontifical University, Faculty of Economics and Business Administration, ICADE Working Paper. |
| [72] | Simon J (2020) Learn Amazon SageMaker: A guide to building, training, and deploying machine learning models for developers and data scientists, Packt > Birmingham – Mumbai. |
| [73] |
Wang X, Han R, Zheng M et al. (2024) Competitive strategy and stock market liquidity: a natural language processing approach. Inf Technol Manage 25: 99–112. https://doi.org/10.1007/s10799-023-00401-2 doi: 10.1007/s10799-023-00401-2
|
| [74] | Wawer A, Sobiczewska J (2019) Predicting Sentiment of Polish Language Short Texts. Proceedings - Natural Language Processing in a Deep Learning World, 1321–1327. https://doi.org/10.26615/978-954-452-056-4_151 |
| [75] | Watts RL (1975) The Time Series Behavior of Quarterly Earnings. Working paper, Department of Commerce, University of New Castle, April 1975. |
| [76] |
Wierzba M, Riegel M, Kocoń J, et al. (2021) Emotion norms for 6000 Polish word meanings with a direct mapping to the Polish wordnet. Behav Res Methods 54: 2146–2161. https://doi.org/10.3758/s13428-021-01697-0 doi: 10.3758/s13428-021-01697-0
|
| [77] |
Wilcoxon F (1945) Individual comparisons by ranking methods. Biometrics 1: 80–83. http://dx.doi.org/10.2307/3001968 doi: 10.2307/3001968
|
| [78] |
Wujec M (2021) Analysis of the Financial Information Contained in the Texts of Current Reports: A Deep Learning Approach. J Risk Financ Manage 14: 582. https://doi.org/10.3390/jrfm14120582 doi: 10.3390/jrfm14120582
|
| [79] | Wang XQ (2022) Research on enterprise financial performance evaluation method based on data mining. In: 2022 IEEE 2nd International Conference on Electronic Technology, Communication and Information (ICETCI). https://doi.org/10.1109/icetci55101.2022.9832404 |
| [80] | Xu Z (2019) NLP driven large scale financial data analysis. Doctoral dissertation, University of Illinois at Urbana-Champaign. |