A forest of opinions: A multi-model ensemble-HMM voting framework for market regime shift detection and trading

Rethyam Gupta; Sarthak Kapoor; Himank Gupta; Srinivasan Natesan; Rethyam Gupta; Sarthak Kapoor; Himank Gupta; Srinivasan Natesan

doi:10.3934/DSFE.2025019

Data Science in Finance and Economics

2025, Volume 5, Issue 4: 466-501. doi: 10.3934/DSFE.2025019

Previous Article Next Article

Research article

A forest of opinions: A multi-model ensemble-HMM voting framework for market regime shift detection and trading

1.
School of Operations Research and Information Engineering, Cornell University, Ithaca, NY 14850, USA
2.
Department of Computer Science and Engineering, Indian Institute of Technology Guwahati, Guwahati, Assam 781039, India
3.
Department of Mathematics, Indian Institute of Technology Guwahati, Guwahati, Assam 781039, India

Received: 04 April 2025 Revised: 13 September 2025 Accepted: 16 October 2025 Published: 29 October 2025
JEL Codes: C52, C53, C58

In this paper, we present a framework for detecting market regime shifts using a combination of tree-based ensemble learning models and classical statistical techniques. Specifically, we leverage homogeneous ensemble methods (bagging and boosting) alongside the hidden Markov model (HMM) to identify transitions between different market states (e.g., bull, bear, and neutral). We further propose hybrid voting classifiers that integrate the HMM with specific ensemble learning models to enhance the robustness of regime classification. The model incorporates a comprehensive set of macroeconomic and technical market indicators to provide a holistic view of the underlying market dynamics. Although our primary objective is not to optimize for maximum profitability, we demonstrate that the identified regimes can be utilized effectively to construct a viable trading strategy. Our results, based on exchange-traded funds (ETFs) representing the Russell 3000 and the Standard and Poor's 500 (S&P 500) index, indicate that regime-aware strategies developed through our modeling framework can effectively support informed investment decision-making.
- hidden Markov model,
- ensemble learning,
- voting classifier,
- market regime shift detection,
- trading strategies,
- machine learning in finance
Citation: Rethyam Gupta, Sarthak Kapoor, Himank Gupta, Srinivasan Natesan. A forest of opinions: A multi-model ensemble-HMM voting framework for market regime shift detection and trading[J]. Data Science in Finance and Economics, 2025, 5(4): 466-501. doi: 10.3934/DSFE.2025019

Related Papers:

Abstract

In this paper, we present a framework for detecting market regime shifts using a combination of tree-based ensemble learning models and classical statistical techniques. Specifically, we leverage homogeneous ensemble methods (bagging and boosting) alongside the hidden Markov model (HMM) to identify transitions between different market states (e.g., bull, bear, and neutral). We further propose hybrid voting classifiers that integrate the HMM with specific ensemble learning models to enhance the robustness of regime classification. The model incorporates a comprehensive set of macroeconomic and technical market indicators to provide a holistic view of the underlying market dynamics. Although our primary objective is not to optimize for maximum profitability, we demonstrate that the identified regimes can be utilized effectively to construct a viable trading strategy. Our results, based on exchange-traded funds (ETFs) representing the Russell 3000 and the Standard and Poor's 500 (S&P 500) index, indicate that regime-aware strategies developed through our modeling framework can effectively support informed investment decision-making.

References

[1]	Abid I, Dhaoui A, Goutte S, et al. (2019) Contagion and bond pricing: The case of the ASEAN region. Res Int Busi Financ 47: 371–385. https://doi.org/10.1016/j.ribaf.2018.08.010 doi: 10.1016/j.ribaf.2018.08.010
[2]	Adam T, Ötting M, Michels R (2024) Markov-switching decision trees. AStA Adv Stat Anal 108: 461–476. https://doi.org/10.1007/s10182-024-00501-6 doi: 10.1007/s10182-024-00501-6
[3]	AlQaheri H, Panda M (2022) An education process mining framework: Unveiling meaningful information for understanding students' learning behavior and improving teaching quality. Information 13: 29. https://doi.org/10.3390/info13010029 doi: 10.3390/info13010029
[4]	Ang A, Bekaert G (2002) International asset allocation with regime shifts. Rev Financ Stud 15: 1137–1187. https://www.jstor.org/stable/40302459
[5]	Ang A, Timmermann A (2012) Regime changes and financial markets. Annu Rev Financ Econ 4: 313–337. https://dx.doi.org/10.2139/ssrn.1919497 doi: 10.2139/ssrn.1919497
[6]	Appel G (1979) The moving average convergence-divergence trading method. Signalert Corp.
[7]	Bailey DH, Borwein JM, De Prado ML, et al. (2014) The probability of backtest overfitting. J Comput Financ 20: 39–69. https://dx.doi.org/10.2139/ssrn.2326253 doi: 10.2139/ssrn.2326253
[8]	Ballings M, Den Poel DV, Hespeels N, et al. (2015) Evaluating multiple classifiers for stock price direction prediction. Expert Syst Appl 42: 7046–7056. https://doi.org/10.1016/j.eswa.2015.05.013 doi: 10.1016/j.eswa.2015.05.013
[9]	Bao W, Yue J, Rao Y (2017) A deep learning framework for financial time series using stacked autoencoders and long short-term memory. Neurocomputing 356: 72–78. https://doi.org/10.1371/journal.pone.0180944 doi: 10.1371/journal.pone.0180944
[10]	Basak S, Kar S, Saha S, et al. (2019) Predicting the direction of stock market prices using tree-based classifiers. North Am J Econ Financ 47: 552–567. https://doi.org/10.1016/j.najef.2018.06.013 doi: 10.1016/j.najef.2018.06.013
[11]	Baum LE, Petrie T, Soules G, et al. (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Ann Math Stat 41: 164–171. https://www.jstor.org/stable/2239727
[12]	Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13: 281–305. http://jmlr.org/papers/v13/bergstra12a.html
[13]	Bollinger JA (2001) Bollinger on Bollinger Bands. New York: McGraw-Hill.
[14]	Breiman L (2001) Random forests. Mach Learn 45: 5–32. https://doi.org/10.1023/A:1010933404324 doi: 10.1023/A:1010933404324
[15]	Bulla J, Bulla I (2006) Stylized facts of financial time series and hidden semi-markov models. Comput Stat Data Anal 51: 2192–2209. https://doi.org/10.1016/j.csda.2006.07.021 doi: 10.1016/j.csda.2006.07.021
[16]	Calvet LE, Fisher AJ (2002) Multifractality in asset returns: Theory and evidence. Rev Econ Stat 84: 381–406. https://www.jstor.org/stable/3211559
[17]	Carhart MM (1997) On persistence in mutual fund performance. J Finance 52: 57–82. https://doi.org/10.1111/j.1540-6261.1997.tb03808.x doi: 10.1111/j.1540-6261.1997.tb03808.x
[18]	CBOE (2023) Put/call ratios - CBOE. Accessed: 2025-03-29.
[19]	Chauvet M, Hamilton JD (2006) Dating business cycle turning points. Contrib Econ Anal 243: 1–54. Available from: https://www.nber.org/papers/w11422.
[20]	Chen L, Pelger M, Zhu J (2019) Deep learning in asset pricing. SSRN Electron J. https://doi.org/10.48550/arXiv.1904.00745
[21]	Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. In: Proc 22nd ACM SIGKDD Int Conf Knowl Discov Data Min, 785–794. https://doi.org/10.1145/2939672.2939785
[22]	Christoffersen P, Diebold FX (2006) Financial asset returns, direction-of-change forecasting, and volatility dynamics. Manage Sci 52: 1273–1287. Available from: https://www.jstor.org/stable/20110599.
[23]	Clark TE, Ravazzolo F (2015) Macroeconomic forecasting performance under alternative specifications of time-varying volatility. J Appl Econom 30: 551–575. https://www.jstor.org/stable/26609047
[24]	De Prado ML (2018) Advances in Financial Machine Learning. Wiley.
[25]	Developers Q (2025) QuantStats: Portfolio analytics for quants. Available from: https://github.com/ranaroussi/quantstats.
[26]	Dias A, Embrechts P (2010) Modeling exchange rate dependence dynamics at different time horizons. J Int Money Finance 29: 1687–1705. https://doi.org/10.1016/j.jimonfin.2010.06.004 doi: 10.1016/j.jimonfin.2010.06.004
[27]	Dietterich TG (2000) An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Mach Learn 40: 139–157. https://doi.org/10.1023/A:1007607513941 doi: 10.1023/A:1007607513941
[28]	Dixon MR, Giroux I, Jacques C, et al. (2018) What characterizes excessive online stock trading? A qualitative study. J Gambl Issues 38: 182–203. https://cdspress.ca/?p = 8565
[29]	Dormann CF, Elith J, Bacher S, et al. (2013) Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography 36: 27–46. https://doi.org/10.1111/j.1600-0587.2012.07348.x doi: 10.1111/j.1600-0587.2012.07348.x
[30]	Dorogush AV, Ershov V, Gulin A (2018) CatBoost: Gradient boosting with categorical features support. arXiv preprint. https://doi.org/10.48550/arXiv.1810.11363
[31]	Durbin J, Koopman SJ (2012) Time Series Analysis by State Space Methods. Oxford Univ Press.
[32]	El-Hassan N, Kofman P (2003) Tracking error and active portfolio management. Aust J Manag 28: 183–207. https://doi.org/10.1177/031289620302800204 doi: 10.1177/031289620302800204
[33]	Feng G, He J, Polson NG (2018) Deep learning for predicting asset returns. https://doi.org/10.48550/arXiv.1804.09314
[34]	Fischer T, Krauss C (2018) Deep learning with long short-term memory networks for financial market predictions. Eur J Oper Res 270: 654–669. https://doi.org/10.1016/j.ejor.2017.11.054 doi: 10.1016/j.ejor.2017.11.054
[35]	Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55: 119–139. https://doi.org/10.1006/jcss.1997.1504 doi: 10.1006/jcss.1997.1504
[36]	Friedman JH (2001) Greedy function approximation: A gradient boosting machine. Ann Stat 29: 1189–1232. Available from: https://www.jstor.org/stable/2699986.
[37]	Guidolin M (2011) Markov switching models in empirical finance. Adv Econom 23: 1–86. https://doi.org/10.1108/S0731-9053(2011)000027B004 doi: 10.1108/S0731-9053(2011)000027B004
[38]	Guidolin M, Timmermann A (2006) An econometric model of nonlinear dynamics in the joint distribution of stock and bond returns. J Appl Econom 21: 1–22. https://dx.doi.org/10.2139/ssrn.582581 doi: 10.2139/ssrn.582581
[39]	Guidolin M, Timmermann A (2007) Asset allocation under multivariate regime switching. J Econ Dyn Control 31: 3503–3544. https://doi.org/10.1016/j.jedc.2006.12.004 doi: 10.1016/j.jedc.2006.12.004
[40]	Gupta R, Pandey A, Pandey A (2025) Can deep reinforcement learning reliably improve dynamic portfolio allocation? SSRN Electron J. https://dx.doi.org/10.2139/ssrn.5241923
[41]	Hamilton JD (1989) A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica 57: 357–384. https://www.jstor.org/stable/1912559
[42]	Hamilton JD (1990) Analysis of time series subject to changes in regime. J Econom 45: 39–70. https://doi.org/10.1016/0304-4076(90)90093-9 doi: 10.1016/0304-4076(90)90093-9
[43]	Hasbrouck J (2009) Trading costs and returns for U.S. equities: Estimating effective costs from daily data. J Financ 64: 1445–1477. https://doi.org/10.1111/j.1540-6261.2009.01469.x doi: 10.1111/j.1540-6261.2009.01469.x
[44]	Hassan MR, Nath BK (2005) Stock market forecasting using hidden Markov model: A new approach. In: Proc 5th Int Conf Intell Syst Des Appl (ISDA) 192–196. IEEE. https://ieeexplore.ieee.org/document/1578783
[45]	Hastie T, Tibshirani R, Friedman J (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2nd ed.
[46]	He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21: 1263–1284. Available from: https://ieeexplore.ieee.org/document/5128907.
[47]	Henrique BM, Sobreiro VA, Kimura H (2019) Literature review: Machine learning techniques applied to financial market prediction. Expert Syst Appl 124: 226–251. https://doi.org/10.1016/j.eswa.2019.01.012 doi: 10.1016/j.eswa.2019.01.012
[48]	hmmlearn developers (2024) hmmlearn: Hidden Markov models in Python. Available from: https://github.com/hmmlearn/hmmlearn.
[49]	Hoerl AE, Kennard RW (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12: 55–67. https://www.jstor.org/stable/1271436
[50]	Hull JC (2018) Options, Futures, and Other Derivatives. Pearson, 10th ed.
[51]	Jacquier E, Polson NG, Rossi PE (1994) Bayesian analysis of stochastic volatility models. J Bus Econ Stat 12: 371–389. https://doi.org/10.2307/1392199 doi: 10.2307/1392199
[52]	Jiang Z, Xu D, Liang J (2017) A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint. https://doi.org/10.48550/arXiv.1706.10059
[53]	Khan AA, Chaudhari O, Chandra R (2024) A review of ensemble learning and data augmentation models for class imbalanced problems: Combination, implementation and evaluation. Expert Syst Appl 244: 122778. https://doi.org/10.1016/j.eswa.2023.122778 doi: 10.1016/j.eswa.2023.122778
[54]	Kim S, Shephard N, Chib S (1998) Stochastic volatility: Likelihood inference and comparison with ARCH models. Rev Econ Stud 65: 361–393. https://www.jstor.org/stable/2566931
[55]	Krauss C, Do XA, Huck N (2017) Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500. Eur J Oper Res 259: 689–702. https://doi.org/10.1016/j.ejor.2016.10.031 doi: 10.1016/j.ejor.2016.10.031
[56]	Kritzman M, Page S, Turkington D (2012) Regime shifts: Implications for dynamic strategies. Financ Anal J 68: 22–39. https://doi.org/10.2469/faj.v68.n3.3 doi: 10.2469/faj.v68.n3.3
[57]	Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Q 2: 83–97. https://doi.org/10.1002/nav.3800020109 doi: 10.1002/nav.3800020109
[58]	Kuncheva LI, Rodríguez JJ (2012) A weighted voting framework for classifier ensembles. Knowl Inf Syst 38: 259–275. https://doi.org/10.1007/s10115-012-0586-6 doi: 10.1007/s10115-012-0586-6
[59]	Kutner MH, Nachtsheim CJ, Neter J, et al. (2005) Applied Linear Statistical Models. McGraw-Hill/Irwin, New York, 5th ed.
[60]	Michels R, Adam T, Ötting M (2023) Tree-based regression within a hidden Markov model framework. Book of Short Papers - CLADAG2023.
[61]	Morgan J, Reuters (1996) RiskMetrics™—Technical Document. Morgan Guaranty Trust Company of New York, New York, 4th ed.
[62]	Murphy JJ (1999) Technical Analysis of the Financial Markets: A Comprehensive Guide to Trading Methods and Applications. New York Inst Finance.
[63]	Neely CJ, Rapach DE, Tu J, et al. (2014) Forecasting the equity risk premium: The role of technical indicators. Manage Sci 60: 1772–1791. https://www.jstor.org/stable/42919633
[64]	Nguyen N, Nguyen D (2015) Hidden Markov model for stock selection. Risks 3: 455–473. https://doi.org/10.3390/risks3040455 doi: 10.3390/risks3040455
[65]	Nystrup P, Madsen H, Lindström E (2017) Long memory of financial time series and hidden Markov models with time-varying parameters. J Forecast 36: 989–1002. https://doi.org/10.1002/for.2447 doi: 10.1002/for.2447
[66]	Oelschläger L, Adam T, Michels R (2024) fHMM: Hidden Markov models for financial time series in R. J Stat Softw 109: 1–25. https://doi.org/10.18637/jss.v109.i09 doi: 10.18637/jss.v109.i09
[67]	Pardo L (2006) Statistical Inference Based on Divergence Measures, Monogr Stat Appl Probab. Chapman and Hall/CRC. https://doi.org/10.1201/9781420034813
[68]	Patel J, Shah S, Thakkar P, et al. (2015) Predicting stock market index using fusion of machine learning techniques. Expert Syst Appl 42: 2162–2172. https://doi.org/10.1016/j.eswa.2014.10.031 doi: 10.1016/j.eswa.2014.10.031
[69]	Pedregosa F, Varoquaux G, Gramfort A, et al. (2011) Scikit-learn: Machine learning in Python. J Mach Learn Res 12: 2825–2830
[70]	Polikar R (2006) Ensemble based systems in decision making. IEEE Circuits Syst Mag 6: 21–45. https://ieeexplore.ieee.org/document/1688199
[71]	Quinlan JR (1986) Induction of decision trees. In: Mach Learn 1: 81–106. Springer. https://doi.org/10.1007/BF00116251
[72]	Rabiner LR (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proc IEEE 77: 257–286. https://ieeexplore.ieee.org/document/18626/
[73]	Rapach DE, Strauss JK, Zhou G (2010) Out-of-sample equity premium prediction: Combination forecasts and links to the real economy. Rev Financ Stud 23: 821–862. https://doi.org/10.1093/rfs/hhp063 doi: 10.1093/rfs/hhp063
[74]	Roberts HV (1959) Stock-market "patterns" and financial analysis: Methodological suggestions. J Finance 14: 1–10. https://www.jstor.org/stable/2976094
[75]	Russell Investments (2020) Market cycles: What they are and why they matter.
[76]	Rydén T, Teräsverta T, Åsbrink S (1998) Stylized facts of daily return series and the hidden Markov model. J Appl Econom 13: 217–244. https://www.jstor.org/stable/223228
[77]	Schapire RE (1990) The strength of weak learnability. Mach Learn 5: 197–227. https://doi.org/10.1007/BF00116037 doi: 10.1007/BF00116037
[78]	Sharpe WF (1966) Mutual fund performance. J Bus 39: 119–138. https://www.jstor.org/stable/2351741
[79]	Shi Z, Wu Z, Shi S, et al. (2022) High-frequency forecasting of stock volatility based on model fusion and a feature reconstruction neural network. Electronics 11: 4057. https://doi.org/10.3390/electronics11234057 doi: 10.3390/electronics11234057
[80]	Sortino FA, Price LN (1994) Performance measurement in a downside risk framework. J Invest 3: 59–64. Available from: https://www.pm-research.com/content/iijinvest/3/3/59.
[81]	Tashman LJ (2000) Out-of-sample tests of forecasting accuracy: An analysis and review. Int J Forecast 16: 437–450. https://doi.org/10.1016/S0169-2070(00)00065-0 doi: 10.1016/S0169-2070(00)00065-0
[82]	Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 58: 267–288. https://www.jstor.org/stable/2346178
[83]	Tsay RS (2010) Analysis of Financial Time Series. Wiley, 3rd ed.
[84]	Viterbi AJ (1967) Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans Inf Theory 13: 260–269. https://ieeexplore.ieee.org/document/1054010
[85]	Wilder JW (1978) New Concepts in Technical Trading Systems. Trend Research.
[86]	Yardeni Research (2023) Market valuation & bond yields.
[87]	Zhang JM, Harman M, Ma L, et al. (2022) Machine learning testing: Survey, landscapes and horizons. IEEE Trans Softw Eng 48: 1–36. https://ieeexplore.ieee.org/document/9000651/
[88]	Zhou ZH (2012) Ensemble Methods: Foundations and Algorithms. Taylor & Francis.

Reader Comments

Your name:*

Email:*
© 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)