Leveraging Markowitz, random forest, and XGBoost for optimal diversification of South African stock portfolios

Esau Moyoweshumba; Modisane Seitshiro; Esau Moyoweshumba; Modisane Seitshiro

doi:10.3934/DSFE.2025010

Data Science in Finance and Economics

2025, Volume 5, Issue 2: 205-233. doi: 10.3934/DSFE.2025010

Previous Article Next Article

Research article

Leveraging Markowitz, random forest, and XGBoost for optimal diversification of South African stock portfolios

Esau Moyoweshumba ^{1
,
,},
Modisane Seitshiro ^2,3

1.
African Institute for Mathematical Sciences, Muizenberg, 7945, Cape Town, South Africa
2.
Centre for Business Mathematics and Informatics, North-West University, Potchefstroom 2531, South Africa
3.
National Institute for Theoretical and Computational Sciences, Potchefstroom 2531, South Africa

Received: 14 November 2024 Revised: 29 March 2025 Accepted: 08 April 2025 Published: 20 May 2025
JEL Codes: C32, C58, C61, C63, D53, G11

The challenge for investors is the uncertainty of investing in a diversified portfolio using traditional techniques. This research project aimed to investigate and compare the effectiveness of two traditional Markowitz models and two machine learning techniques in optimizing diversified stock portfolios. The Markowitz models employed were the modern portfolio theory efficient frontier and quadratic mean-variance programming. The machine learning models utilized were the random forest classifier and the Extreme Gradient Boosting classifier. South African historical stock market time series listed companies on the Johannesburg Stock Exchange were analyzed to identify patterns and allocate stock weights to inform optimal stock portfolio diversification decisions. This minimizes risk while maximizing returns. Furthermore, the results are benchmarked against the JSE Top 40 Index. The portfolio generated by the eXtreme Gradient Boosting model performed the best, achieving a Sharpe ratio of 1.44. It is followed by the portfolio generated by the Random Forest classifier with a Sharpe ratio equal to 1.30. The portfolios generated by Modern Portfolio Theory Efficient Frontier and Quadratic Mean-Variance Programming had Sharpe ratios of 0.89 and 0.41, respectively. However, the JSE Top 40 Index portfolio had the lowest Sharpe ratio of 0.35. These results confirm the superiority of machine learning models over traditional Markowitz models. However, traditional models remain fundamental in understanding the principles of portfolio diversification. Machine learning models may be prioritized by investors primarily because of their ability to capture stochastic insights and market dynamics.
- diversification,
- efficient frontier,
- modern portfolio theory,
- optimization,
- random forest,
- returns,
- sharpe ratio,
- volatility,
- XGBoost
Citation: Esau Moyoweshumba, Modisane Seitshiro. Leveraging Markowitz, random forest, and XGBoost for optimal diversification of South African stock portfolios[J]. Data Science in Finance and Economics, 2025, 5(2): 205-233. doi: 10.3934/DSFE.2025010

Related Papers:

Abstract

The challenge for investors is the uncertainty of investing in a diversified portfolio using traditional techniques. This research project aimed to investigate and compare the effectiveness of two traditional Markowitz models and two machine learning techniques in optimizing diversified stock portfolios. The Markowitz models employed were the modern portfolio theory efficient frontier and quadratic mean-variance programming. The machine learning models utilized were the random forest classifier and the Extreme Gradient Boosting classifier. South African historical stock market time series listed companies on the Johannesburg Stock Exchange were analyzed to identify patterns and allocate stock weights to inform optimal stock portfolio diversification decisions. This minimizes risk while maximizing returns. Furthermore, the results are benchmarked against the JSE Top 40 Index. The portfolio generated by the eXtreme Gradient Boosting model performed the best, achieving a Sharpe ratio of 1.44. It is followed by the portfolio generated by the Random Forest classifier with a Sharpe ratio equal to 1.30. The portfolios generated by Modern Portfolio Theory Efficient Frontier and Quadratic Mean-Variance Programming had Sharpe ratios of 0.89 and 0.41, respectively. However, the JSE Top 40 Index portfolio had the lowest Sharpe ratio of 0.35. These results confirm the superiority of machine learning models over traditional Markowitz models. However, traditional models remain fundamental in understanding the principles of portfolio diversification. Machine learning models may be prioritized by investors primarily because of their ability to capture stochastic insights and market dynamics.

References

[1]	Abdi F, Abolmakarem S, Yazd AK, et al. (2024) Prospective portfolio optimization with asset preselection using a combination of long and short term memory and Sharpe ratio maximization. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3466829
[2]	Arrow KJ, Debreu G (1954) Existence of an equilibrium for a competitive economy. Economet J Economet Soc, 265–290.
[3]	Attia EF, Aly SM, ElRawas As, et al. (2023) Portfolio diversification benefits before and during the times of covid-19: evidence from usa. Future Bus J 9: 26. https://doi.org/10.1186/s43093-023-00205-4 doi: 10.1186/s43093-023-00205-4
[4]	Avella A (2024) Real-world applications of Markowitz's portfolio optimization: A quantitative study. ResearchGate.
[5]	Breiman L (2001) Random forests. Mach learn 45: 5–32.
[6]	Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785–794. https://doi.org/10.1145/2939672.2939785
[7]	Deng GF, Lin WT, Lo CC (2012) Markowitz-based portfolio selection with cardinality constraints using improved particle swarm optimization. Expert Syst Appl 39: 4558–4566. https://doi.org/10.1016/j.eswa.2011.09.129 doi: 10.1016/j.eswa.2011.09.129
[8]	Fabozzi FJ, Kolm PN, Pachamanova DA, et al. (2007) Robust portfolio optimization and management. John Wiley & Sons.
[9]	Hastie T, Tibshirani R, Friedman JH, et al. (2009) The elements of statistical learning: data mining, inference, and prediction, 2. Springer. https://doi.org/10.1007/978-0-387-21606-5
[10]	Kumar RR, Ghanbari H, Stauvermann PJ (2024) Application of a robust maximum diversified portfolio to a small economy's stock market: An application to Fiji's south pacific stock exchange. J Risk Financ Manag 17: 388. https://doi.org/10.3390/jrfm17090388 doi: 10.3390/jrfm17090388
[11]	Markowitz H (1952) Portfolio selection. J Financ 7: 77–91.
[12]	Modigliani F, Miller MH (1958) The cost of capital, corporation finance and the theory of investment. Am Econ Rev 48: 261–297.
[13]	Murphy KP (2022) Probabilistic machine learning: an introduction. MIT press.
[14]	Nagurney A (2009) Portfolio optimization. Advanced Management Development Program in Real Estate.
[15]	Packard T, Gentilini U, Grosh M, et al. (2019) Protecting all: Risk sharing for a diverse and diversifying world of work. World Bank Publications.
[16]	Pandi A (2020) Mean-semivariance approach for portfolio optimisation.
[17]	Rathi V, Kshirsagar M, Ryan C (2024) Enhancing portfolio performance: A random forest approach to volatility prediction and optimization. In ICAART, 1278–1285.
[18]	García-Medina A, Rodríguez-Camejo B (2024) Random matrix theory and nested clustered optimization on high-dimensional portfolios. Int J Mod Phys C, 35: 1–19. https://doi.org/10.1142/S0129183124500980 doi: 10.1142/S0129183124500980
[19]	Sdg U (2019) Sustainable development goals. Energy Progress Report, Tracking SDG, 7: 805–814.
[20]	Sharpe WF (1966) Mutual fund performance. J Bus 39: 119–138.
[21]	Siew LW, Jaaman SH, Hoe LW (2019) Mathematical modelling of risk in portfolio optimization with mean-gini approach. In Journal of Physics: Conference Series, 1212: 012031. IOP Publishing.
[22]	Stiglingh ZC, Seitshiro MB (2022) Quantification of garch (1, 1) model misspecification with three known assumed error term distributions. J Financ Risk Manag 11: 549–578. https://doi.org/10.4236/jfrm.2022.113026 doi: 10.4236/jfrm.2022.113026
[23]	Sutiene K, Schwendner P, Sipos C, et al. (2024) Enhancing portfolio management using artificial intelligence: literature review. Front Artif Intell 7: 1371502. https://doi.org/10.3389/frai.2024.1371502 doi: 10.3389/frai.2024.1371502
[24]	Tan JHJ, Kek SL (2020) A simulation optimization model for portfolio selection problem with quadratic programming technique. In AIP Conference Proceedings, 2266. AIP Publishing.
[25]	Uykun FN (2024) Machine learning applications in portfolio optimization. Master's thesis, Middle East Technical University.
[26]	Van Greunen J, Heymans A (2023) Determining the impact of different forms of stationarity on financial time series analysis. In Business Research: An Illustrative Guide to Practical Methodological Applications in Selected Case Studies, 61–76. Springer.
[27]	Zanjirdar M (2020) Overview of portfolio optimization models. Adv Math financ Appl 5: 419–435. https://doi.org/10.22034/amfa.2020.1897346.1407 doi: 10.22034/amfa.2020.1897346.1407
[28]	Zhang C, Sjarif NNA, Ibrahim R (2024) Deep learning models for price forecasting of financial time series: A review of recent advancements: 2020–2022. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 14: e1519.

Reader Comments

Your name:*

Email:*
© 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)