Research article

Exploratory mean-variance portfolio selection with constant elasticity of variance models in regime-switching markets

  • Published: 27 January 2026
  • 93E20, 93E35

  • We studied a continuous-time exploratory mean-variance portfolio optimization problem using a reinforcement learning framework. The problem was set in a Markov regime-switching financial market, which captured time-varying market characteristics. Under different regimes, the risky asset followed constant elasticity of variance dynamics with regime-dependent parameters. Within this setting, the exploratory mean-variance portfolio optimization problem was formulated as a stochastic control problem. Stochastic dynamic programming techniques were employed to derive the Hamilton–Jacobi–Bellman equation associated with the exploratory control problem. We first derived analytical solutions for the optimal investment strategy and the corresponding value function. Although these solutions admitted closed-form representations, they were expressed in an integral form, which made them difficult to implement directly in practical numerical computations. Because of this, we developed a reinforcement learning algorithm to approximate the optimal investment policy and the corresponding value function. Based on the structural properties of the analytical solutions, we established the convergence of both the investment policy and the value function by invoking the policy improvement theorem. This result provided a rigorous theoretical foundation for the proposed algorithm. In the algorithmic implementation, linear function approximation was employed to parameterize both the value function and the investment policy. Finally, numerical experiments were conducted to verify the convergence behavior of the proposed algorithm and to demonstrate its effectiveness in solving the considered portfolio optimization problem.

    Citation: Xiaoyu Xing, Xingtian Zhang, Jiarou Luo. Exploratory mean-variance portfolio selection with constant elasticity of variance models in regime-switching markets[J]. Journal of Industrial and Management Optimization, 2026, 22(2): 1087-1111. doi: 10.3934/jimo.2026040

    Related Papers:

  • We studied a continuous-time exploratory mean-variance portfolio optimization problem using a reinforcement learning framework. The problem was set in a Markov regime-switching financial market, which captured time-varying market characteristics. Under different regimes, the risky asset followed constant elasticity of variance dynamics with regime-dependent parameters. Within this setting, the exploratory mean-variance portfolio optimization problem was formulated as a stochastic control problem. Stochastic dynamic programming techniques were employed to derive the Hamilton–Jacobi–Bellman equation associated with the exploratory control problem. We first derived analytical solutions for the optimal investment strategy and the corresponding value function. Although these solutions admitted closed-form representations, they were expressed in an integral form, which made them difficult to implement directly in practical numerical computations. Because of this, we developed a reinforcement learning algorithm to approximate the optimal investment policy and the corresponding value function. Based on the structural properties of the analytical solutions, we established the convergence of both the investment policy and the value function by invoking the policy improvement theorem. This result provided a rigorous theoretical foundation for the proposed algorithm. In the algorithmic implementation, linear function approximation was employed to parameterize both the value function and the investment policy. Finally, numerical experiments were conducted to verify the convergence behavior of the proposed algorithm and to demonstrate its effectiveness in solving the considered portfolio optimization problem.



    加载中


    [1] H. Markowitz, Portfolio selection, J. Finance, 7 (1952), 77–91. https://doi.org/10.1111/j.1540-6261.1952.tb01525.x
    [2] D. Li, W. L. Ng, Optimal dynamic portfolio selection: multiperiod mean-variance formulation, Math. Finance, 10 (2000), 387–406. https://doi.org/10.1111/1467-9965.00100 doi: 10.1111/1467-9965.00100
    [3] X. Y. Zhou, D. Li, Continuous-time mean-variance portfolio selection: a stochastic LQ framework, Appl. Math. Optim., 42 (2000), 19–33. https://doi.org/10.1007/s002450010003 doi: 10.1007/s002450010003
    [4] M. C. Chiu, D. Li, Asset and liability management under a continuous-time mean-variance optimization framework, Insur. Math. Econ., 39 (2006), 330–355. https://doi.org/10.1016/j.insmatheco.2006.03.006 doi: 10.1016/j.insmatheco.2006.03.006
    [5] J. N. Zhang, P. Chen, Z. Jin, S. Li, Open-loop equilibrium strategy for mean-variance portfolio selection: a log-return model, J. Ind. Manag. Optim., 17 (2021), 765–777. https://doi.org/10.3934/jimo.2019133 doi: 10.3934/jimo.2019133
    [6] J. D. Hamilton, A new approach to the economic analysis of nonstationary time series and the business cycle, Econometrica, 57 (1989), 357–384. https://doi.org/10.2307/1912559 doi: 10.2307/1912559
    [7] X. Y. Zhou, G. Yin, Markowitz's mean-variance portfolio selection with regime switching: A continuous-time model, SIAM J. Control Optim., 42 (2003), 1466–1482. https://doi.org/10.1137/S0363012902405583 doi: 10.1137/S0363012902405583
    [8] V. A. Gal'perin, V. V. Dombrovsky, E. N. Fedosov, Dynamic control of the investment portfolio in the jump-diffusion financial market with regime switching, Autom. Remote Control, 66 (2005), 837–850. https://doi.org/10.1007/s10513-005-0127-9 doi: 10.1007/s10513-005-0127-9
    [9] P. Chen, H. L. Yang, G. Yin, Markowitz's mean-variance asset-liability management with regime switching: a continuous-time model, Insur. Math. Econ., 43 (2008), 456–465. https://doi.org/10.1016/j.insmatheco.2008.09.001 doi: 10.1016/j.insmatheco.2008.09.001
    [10] S. X. Xie, Continuous-time mean-variance portfolio selection with liability and regime switching, Insur. Math. Econ., 45 (2009), 148–155. https://doi.org/10.1016/j.insmatheco.2009.05.005 doi: 10.1016/j.insmatheco.2009.05.005
    [11] P. Chen, H. L. Yang, Markowitz's mean-variance asset-liability management with regime switching: a multi-period model, Appl. Math. Finance, 18 (2011), 29–50. https://doi.org/10.1080/13504861003703633 doi: 10.1080/13504861003703633
    [12] X. W. Chen, F. Z. Huang, X. F. Li, Robust asset-liability management under CRRA utility criterion with regime switching: a continuous-time model, Stoch. Model, 38 (2022), 167–189. https://doi.org/10.1080/15326349.2021.1985520 doi: 10.1080/15326349.2021.1985520
    [13] J. Zhou, G. Liu, H. L. Yang, Optimal consumption and investment strategies with liquidity risk and lifetime uncertainty for Markov regime-switching jump diffusion models, Eur. J. Oper. Res., 280 (2020), 1130–1143. https://doi.org/10.1016/j.ejor.2019.07.066 doi: 10.1016/j.ejor.2019.07.066
    [14] J. Eisenberg, L. Fabrykowski, M. D. Schmeck, Optimal surplus-dependent reinsurance under regime-switching in a brownian risk model, Risks, 9 (2021), 73. https://doi.org/10.3390/risks9040073 doi: 10.3390/risks9040073
    [15] D. X. Zhao, C. N. Song, Liability assessment of life insurance companies in regime switching market, Commun. Stat-theor. M., 54 (2025), 7210–7229. https://doi.org/10.1080/03610926.2025.2467204 doi: 10.1080/03610926.2025.2467204
    [16] H. R. Wang, T. Zariphopoulou, X. Y. Zhou, Reinforcement learning in continuous time and space: a stochastic control approach, J. Mach. Learn. Res., 21 (2020), 8145–8178. http://jmlr.org/papers/v21/19-144.html
    [17] H. Wang, X. Y. Zhou, Continuous-time mean-variance portfolio selection: A reinforcement learning framework, Math. Finance, 30 (2020), 1273–1308. https://doi.org/10.1111/mafi.12281 doi: 10.1111/mafi.12281
    [18] X. Guo, R. Y. Xu, T. Zariphopoulou, Entropy regularization for mean field games with learning, Math. Oper. Res., 47 (2022), 3239–3260. https://doi.org/10.1287/moor.2021.1238 doi: 10.1287/moor.2021.1238
    [19] D. Firoozi, S. Jaimungal, Exploratory LQG mean field games with entropy regularization, Automatica, 139 (2022), 110177. https://doi.org/10.1016/j.automatica.2022.110177 doi: 10.1016/j.automatica.2022.110177
    [20] Y. W. Jia, X. Y. Zhou, Policy evaluation and temporal-difference learning in continuous time and space: a martingale approach, J. Mach. Learn. Res., 23 (2022), 1–55. http://jmlr.org/papers/v23/21-0947.html
    [21] Y. W. Jia, X. Y. Zhou, Policy gradient and actor-critic learning in continuous time and space: theory and algorithms, J. Mach. Learn. Res., 23 (2022), 1–50. http://jmlr.org/papers/v23/21-1387.html
    [22] Y. W. Jia, X. Y. Zhou, Q-learning in continuous time, J. Mach. Learn. Res., 24 (2023), 1–61. ahttp://jmlr.org/papers/v24/22-0755.html
    [23] M. Dai, Y. C. Dong, Y. W. Jia, Learning equilibrium mean-variance strategy, Math. Financ., 33 (2023), 1166–1212. https://doi.org/10.1111/mafi.12402 doi: 10.1111/mafi.12402
    [24] M. Dai, Y. C. Dong, Y. W. Jia, X. Y. Zhou, Data-driven Merton's strategies via policy randomization, arXiv preprint, arXiv: 2312.11797, 2025. https://doi.org/10.48550/arXiv.2312.11797
    [25] X. Han, R. D. Wang, X. Y. Zhou, Choquet regularization for continuous-time reinforcement learning, SIAM J. Contral Optim., 61 (2023), 2777–2801. https://doi.org/10.1137/22M1524734 doi: 10.1137/22M1524734
    [26] J. Y. Guo, X. Han, H. Wang, Exploratory mean-variance portfolio selection with Choquet regularizers, arXiv preprint, arXiv: 2307.03026, 2023. https://doi.org/10.48550/arXiv.2307.03026
    [27] J. Guo, X. Han, H. Wang, K. C. Yuen, A non-zero-sum game with reinforcement learning under mean-variance framework, arXiv preprint, arXiv: 2502.04788, 2025. https://doi.org/10.48550/arXiv.2502.04788
    [28] Y. M. Chen, B. Li, D. Saunders, Exploratory mean-variance portfolio optimization with regime-switching market dynamics, arXiv preprint, arXiv: 2501.16659, 2025. https://doi.org/10.48550/arXiv.2501.16659
    [29] B. Wu, L. F. Li, Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market, J. Econ. Dyn. Contral., 158 (2024), 104787. https://doi.org/10.1016/j.jedc.2023.104787 doi: 10.1016/j.jedc.2023.104787
    [30] X. F. Li, D. X. Zhao, X. W. Chen, Asset-liability management with state-dependent utility in the regime-switching market, Stoch. Models, 39 (2023), 566–591. https://doi.org/10.1080/15326349.2022.2138440 doi: 10.1080/15326349.2022.2138440
    [31] S. Basak, G. Chabakauri, Dynamic mean-variance asset allocation, Rev. Financ. Stud., 23 (2010), 2970–3016. https://doi.org/10.1093/rfs/hhq028 doi: 10.1093/rfs/hhq028
    [32] T. Björk, A. Murgoci, A general theory of Markovian time inconsistent stochastic control problems, SSRN Elect. J., 2010. https://doi.org/10.2139/ssrn.1694759
    [33] R. D. Wang, Y. R. Wei, G. E. Willmot, Characterization, robustness and aggregation of signed Choquet integrals, Math. Oper. Res., 45 (2020), 993–1015. https://doi.org/10.1287/moor.2019.1020 doi: 10.1287/moor.2019.1020
    [34] M. Jeanblanc, M. Yor, M. Chesney, Mathematical methods for financial markets, London: Springer, 2009. https://doi.org/10.1007/978-1-84628-737-4
    [35] R. S. Sutton, Learning to predict by the methods of temporal differences, Mach. Learn., 3 (1988), 9–44. https://doi.org/10.1023/A:1022633531479 doi: 10.1023/A:1022633531479
    [36] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, et al. Human-level control through deep reinforcement learning, Nature, 518 (2015), 529–533. https://doi.org/10.1038/nature14236
  • jimo-22-02-040-s001.pdf
  • Reader Comments
  • © 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(119) PDF downloads(6) Cited by(0)

Article outline

Figures and Tables

Figures(5)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog