Forecasting the stock market with precision remains a task because of the intricate and ever-changing nature of market data dynamics. This research seeks to enhance precision by harnessing ensemble machine-learning models that capitalize on the unique advantages of each model while addressing their drawbacks. This study addresses this gap by testing models across six major global markets (S&P 500, NASDAQ, DAX, FTSE, Nikkei 225, and Hang Seng), providing insights into the ensemble model's performance in diverse economic contexts. We employed gradient boosting, the decision trees method, the neural networks approach, and Bayesian ridge models to foresee stock prices. We merged them through voting and stacking strategies. The models were assessed using mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE) and the a20 index. Analysis involving analysis of variance (ANOVA) and subsequent Tukey's honest significant difference (HSD) tests was performed to validate performance variations among the tested models. In the global market, the stacking ensemble showed the lowest MAE of 0.0332, followed closely by the voting ensemble, which achieved 0.0347 MAE. These results hint that ensemble models are more accurate in making predictions than standalone and hybrid models in stock price prediction tasks.
Citation: Akila Dabara Kayit, Mohad Tahir Ismail. Leveraging hybrid ensemble models in stock market prediction: A data-driven approach[J]. Data Science in Finance and Economics, 2025, 5(3): 355-386. doi: 10.3934/DSFE.2025015
Forecasting the stock market with precision remains a task because of the intricate and ever-changing nature of market data dynamics. This research seeks to enhance precision by harnessing ensemble machine-learning models that capitalize on the unique advantages of each model while addressing their drawbacks. This study addresses this gap by testing models across six major global markets (S&P 500, NASDAQ, DAX, FTSE, Nikkei 225, and Hang Seng), providing insights into the ensemble model's performance in diverse economic contexts. We employed gradient boosting, the decision trees method, the neural networks approach, and Bayesian ridge models to foresee stock prices. We merged them through voting and stacking strategies. The models were assessed using mean absolute error (MAE), mean squared error (MSE), root mean squared error (RMSE) and the a20 index. Analysis involving analysis of variance (ANOVA) and subsequent Tukey's honest significant difference (HSD) tests was performed to validate performance variations among the tested models. In the global market, the stacking ensemble showed the lowest MAE of 0.0332, followed closely by the voting ensemble, which achieved 0.0347 MAE. These results hint that ensemble models are more accurate in making predictions than standalone and hybrid models in stock price prediction tasks.
| [1] |
Asteris PG, Gavriilaki E, Touloumenidou T, et al. (2022) Genetic prediction of ICU hospitalisation and mortality in COVID-19 patients using artificial neural networks. J Cell Mol Med 26: 1003–1018. https://doi.org/10.1111/jcmm.17098 doi: 10.1111/jcmm.17098
|
| [2] |
Asteris PG, Karoglou M, Skentou AD, et al. (2024) Predicting uniaxial compressive strength of rocks using ANN models: Incorporating porosity, compressional wave velocity, and Schmidt hammer data. Ultrasonics 134: 107347. https://doi.org/10.1016/j.ultras.2024.107347 doi: 10.1016/j.ultras.2024.107347
|
| [3] |
Asteris PG, Tsavdaridis KD, Lemonis ME, et al. (2024) AI-powered GUI for prediction of axial compression capacity in concrete-filled steel tube columns. Neural Comput Appl 36: 22429–22459. https://doi.org/10.1007/s00521-024-10405-w doi: 10.1007/s00521-024-10405-w
|
| [4] |
Basak S, Kar S, Saha S, et al. (2018) Predicting the direction of stock market prices using tree-based classifiers. N Am J Econ Financ 47: 552–567. https://doi.org/10.1016/j.najef.2018.06.013 doi: 10.1016/j.najef.2018.06.013
|
| [5] | Bisdoulis KL (2024) Assets forecasting with feature engineering and transformation methods for LightGBM. arXiv preprint. https://arXiv.org/abs/2501.07580 |
| [6] |
Du Y, Chen D, Li H, et al. (2023) Research on estimating and evaluating subtropical forest carbon stocks by combining multi-payload high-resolution satellite data. Forests 14: 2388. https://doi.org/10.3390/f14122388 doi: 10.3390/f14122388
|
| [7] |
Guo Q (2023) The relationship between investor sentiment and stock market price. Fronti Bus Econ Manage 9: 124–129. https://doi.org/10.54097/fbem.v9i2.9139 doi: 10.54097/fbem.v9i2.9139
|
| [8] | Hartanto A, Kholik YN, Pristyanto Y (2023) Stock price time series data forecasting using the Light Gradient Boosting Machine (LightGBM) model. Int J Inform Visualisation 7: 1740. https://joiv.org/index.php/joiv/article/view/1740 |
| [9] | Hsu JC (1996) Multiple Comparisons: Theory and Methods. CRC Press. https://doi.org/10.1201/b15074 |
| [10] |
Huang J (2024) Prediction of closing prices for NASDAQ-listed stocks: A comparative study based on gradient boosting models. Highlights Sci Eng Technol 92: 171–177. https://doi.org/10.54097/01rvrr58drpress.org doi: 10.54097/01rvrr58drpress.org
|
| [11] | Field A (2013) Discovering Statistics Using IBM SPSS Statistics (4th ed.). Sage Publications, London, England. |
| [12] |
Kumar R, Shrivastav LK (2021) An ensemble of random forest gradient boosting machine and deep learning methods for stock price prediction. J Inform Technol Res 15: 1–19. https://doi.org/10.4018/jitr.2022010102 doi: 10.4018/jitr.2022010102
|
| [13] |
Li D, Liu Z, Armaghani DJ, et al. (2022) Novel ensemble intelligence methodologies for rockburst assessment in complex and variable environments. Sci Rep 12. https://doi.org/10.1038/s41598-022-05594-0 doi: 10.1038/s41598-022-05594-0
|
| [14] | Liu C, Paterlini S (2023) Stock price prediction using temporal graph model with value chain data. Cornell University. https://doi.org/10.48550/arXiv.2303 |
| [15] |
Loo WK (2020) Performing technical analysis to predict Japan REITs' movement through ensemble learning. J Prop Invest Financ 38: 551–562. https://doi.org/10.1108/jpif-01-2020-0007 doi: 10.1108/jpif-01-2020-0007
|
| [16] |
Lumley T, Diehr P, Emerson S, et al. (2002) The importance of the normality assumption in large public health data sets. Annu Rev Public Health 23: 151–169. https://doi.org/10.1146/annurev.publhealth.23.100901.140546 doi: 10.1146/annurev.publhealth.23.100901.140546
|
| [17] |
Lv P, Wu Q, Xu J, et al. (2022) Stock index prediction based on time series decomposition and hybrid model. Entropy 24: 146. https://doi.org/10.3390/e24020146 doi: 10.3390/e24020146
|
| [18] |
Mao Y (2024) Tabtranselu: A transformer adaptation for solving tabular data. Appl Comput Eng 51: 81–88. https://doi.org/10.54254/2755-2721/51/20241174 doi: 10.54254/2755-2721/51/20241174
|
| [19] | McDonald JH (2014) Normality. The BioStat Handbook. Available from: https://www.biostathandbook.com/normality.html. |
| [20] | Mohapatra PR, Parida AK, Swain SK, et al. (2023) Gradient boosting and LSTM based hybrid ensemble learning for two step prediction of stock market. J Adv Inform Technol 14: 1254–1260. https://www.jait.us/show-233-1438-1.html |
| [21] |
Munsarif M, Sam'an M, Safuan S (2022) Peer to peer lending risk analysis based on embedded technique and stacking ensemble learning. Bull Electr Eng Inform 11: 3483–3489. https://doi.org/10.11591/eei.v11i6.3927 doi: 10.11591/eei.v11i6.3927
|
| [22] |
Nti IK, Adekoya AF, Weyori BA (2020a) A comprehensive evaluation of ensemble learning for stock‑market prediction. J Big Data 7: 20. https://doi.org/10.1186/s40537-020-00299-5 doi: 10.1186/s40537-020-00299-5
|
| [23] |
Nti I, Adekoya AF, Weyori BA (2020b) Efficient Stock-Market Prediction Using Ensemble Support Vector Machine. Open Comput Sci 10: 153–163. https://doi.org/10.1515/comp-2020-0199 doi: 10.1515/comp-2020-0199
|
| [24] |
Oukhouya H, Kadiri H, El Himdi K, et al. (2023) Forecasting international stock market trends: XGBoost, LSTM, LSTM-XGBoost, and backtesting XGBoost models. Stat Optim Inform Comput 12: 200–209. https://doi.org/10.19139/soic-2310-5070-1822 doi: 10.19139/soic-2310-5070-1822
|
| [25] |
Sari L, Romadloni A, Lityaningrum R, et al. (2023) Implementation of LightGBM and random forest in potential customer classification. TIERS Inform Technol J 4: 43–55. https://doi.org/10.38043/tiers.v4i1.4355 doi: 10.38043/tiers.v4i1.4355
|
| [26] |
Shabani M, Magris M, Tzagkarakis G, et al. (2023) Predicting the state of synchronisation of financial time series using cross-recurrence plots. Neural Comput Appl 35: 18519–18531. https://doi.org/10.1007/s00521-023-08674-y doi: 10.1007/s00521-023-08674-y
|
| [27] | Shi Z, Hu Y, Mo G, et al. (2022) Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction. arXiv preprint. https://arXiv.org/abs/2204.02623 |
| [28] | Singh S, Madan TK, Kumar J, et al. (2019) Stock market forecasting using machine learning: Today and tomorrow. Proceedings of ICICICT 2019. https://doi.org/10.1109/icicict46008.2019.8993160 |
| [29] |
Tuli S, Tuli S, Tuli R, et al. (2020) Predicting the growth and trend of COVID-19 pandemic using machine learning and cloud computing. Internet Things 11: 100222. https://doi.org/10.1016/j.iot.2020.100222 doi: 10.1016/j.iot.2020.100222
|
| [30] | Villamil L, Bausback R, Salman S, et al. (2023) Improved stock price movement classification using news articles based on embeddings and label smoothing. arXiv preprint. https://arXiv.org/abs/2301.10458 |
| [31] |
Wu Q, Li J, Liu Z, et al. (2023) Symphony in the latent space: Provably integrating high-dimensional techniques with non-linear machine learning models. Proceedings of the AAAI Conference on Artificial Intelligence 37: 10361–10369. https://doi.org/10.1609/aaai.v37i9.26233 doi: 10.1609/aaai.v37i9.26233
|
| [32] |
Xu H (2022) The kernel method is used to include firm correlation for stock price prediction. Comput Intel Neurosci 2022: 1–10. https://doi.org/10.1155/2022/4964394 doi: 10.1155/2022/4964394
|
| [33] | Yu C, Liu F, Zhu J, et al. (2025) Gradient boosting decision tree with LSTM for investment prediction. arXiv preprint. https://arXiv.org/abs/2505.23084 |
| [34] |
Zhou W, Jumahong H, Cui R, et al. (2024) Predicting stock trends using web semantics and feature fusion. Int J Semant Web Inform Syst 20: 1–25. https://doi.org/10.4018/ijswis.346378 doi: 10.4018/ijswis.346378
|