The challenge for investors is the uncertainty of investing in a diversified portfolio using traditional techniques. This research project aimed to investigate and compare the effectiveness of two traditional Markowitz models and two machine learning techniques in optimizing diversified stock portfolios. The Markowitz models employed were the modern portfolio theory efficient frontier and quadratic mean-variance programming. The machine learning models utilized were the random forest classifier and the Extreme Gradient Boosting classifier. South African historical stock market time series listed companies on the Johannesburg Stock Exchange were analyzed to identify patterns and allocate stock weights to inform optimal stock portfolio diversification decisions. This minimizes risk while maximizing returns. Furthermore, the results are benchmarked against the JSE Top 40 Index. The portfolio generated by the eXtreme Gradient Boosting model performed the best, achieving a Sharpe ratio of 1.44. It is followed by the portfolio generated by the Random Forest classifier with a Sharpe ratio equal to 1.30. The portfolios generated by Modern Portfolio Theory Efficient Frontier and Quadratic Mean-Variance Programming had Sharpe ratios of 0.89 and 0.41, respectively. However, the JSE Top 40 Index portfolio had the lowest Sharpe ratio of 0.35. These results confirm the superiority of machine learning models over traditional Markowitz models. However, traditional models remain fundamental in understanding the principles of portfolio diversification. Machine learning models may be prioritized by investors primarily because of their ability to capture stochastic insights and market dynamics.
Citation: Esau Moyoweshumba, Modisane Seitshiro. Leveraging Markowitz, random forest, and XGBoost for optimal diversification of South African stock portfolios[J]. Data Science in Finance and Economics, 2025, 5(2): 205-233. doi: 10.3934/DSFE.2025010
The challenge for investors is the uncertainty of investing in a diversified portfolio using traditional techniques. This research project aimed to investigate and compare the effectiveness of two traditional Markowitz models and two machine learning techniques in optimizing diversified stock portfolios. The Markowitz models employed were the modern portfolio theory efficient frontier and quadratic mean-variance programming. The machine learning models utilized were the random forest classifier and the Extreme Gradient Boosting classifier. South African historical stock market time series listed companies on the Johannesburg Stock Exchange were analyzed to identify patterns and allocate stock weights to inform optimal stock portfolio diversification decisions. This minimizes risk while maximizing returns. Furthermore, the results are benchmarked against the JSE Top 40 Index. The portfolio generated by the eXtreme Gradient Boosting model performed the best, achieving a Sharpe ratio of 1.44. It is followed by the portfolio generated by the Random Forest classifier with a Sharpe ratio equal to 1.30. The portfolios generated by Modern Portfolio Theory Efficient Frontier and Quadratic Mean-Variance Programming had Sharpe ratios of 0.89 and 0.41, respectively. However, the JSE Top 40 Index portfolio had the lowest Sharpe ratio of 0.35. These results confirm the superiority of machine learning models over traditional Markowitz models. However, traditional models remain fundamental in understanding the principles of portfolio diversification. Machine learning models may be prioritized by investors primarily because of their ability to capture stochastic insights and market dynamics.
| [1] | Abdi F, Abolmakarem S, Yazd AK, et al. (2024) Prospective portfolio optimization with asset preselection using a combination of long and short term memory and Sharpe ratio maximization. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3466829 |
| [2] | Arrow KJ, Debreu G (1954) Existence of an equilibrium for a competitive economy. Economet J Economet Soc, 265–290. |
| [3] |
Attia EF, Aly SM, ElRawas As, et al. (2023) Portfolio diversification benefits before and during the times of covid-19: evidence from usa. Future Bus J 9: 26. https://doi.org/10.1186/s43093-023-00205-4 doi: 10.1186/s43093-023-00205-4
|
| [4] | Avella A (2024) Real-world applications of Markowitz's portfolio optimization: A quantitative study. ResearchGate. |
| [5] | Breiman L (2001) Random forests. Mach learn 45: 5–32. |
| [6] | Chen T, Guestrin C (2016) Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, 785–794. https://doi.org/10.1145/2939672.2939785 |
| [7] |
Deng GF, Lin WT, Lo CC (2012) Markowitz-based portfolio selection with cardinality constraints using improved particle swarm optimization. Expert Syst Appl 39: 4558–4566. https://doi.org/10.1016/j.eswa.2011.09.129 doi: 10.1016/j.eswa.2011.09.129
|
| [8] | Fabozzi FJ, Kolm PN, Pachamanova DA, et al. (2007) Robust portfolio optimization and management. John Wiley & Sons. |
| [9] | Hastie T, Tibshirani R, Friedman JH, et al. (2009) The elements of statistical learning: data mining, inference, and prediction, 2. Springer. https://doi.org/10.1007/978-0-387-21606-5 |
| [10] |
Kumar RR, Ghanbari H, Stauvermann PJ (2024) Application of a robust maximum diversified portfolio to a small economy's stock market: An application to Fiji's south pacific stock exchange. J Risk Financ Manag 17: 388. https://doi.org/10.3390/jrfm17090388 doi: 10.3390/jrfm17090388
|
| [11] | Markowitz H (1952) Portfolio selection. J Financ 7: 77–91. |
| [12] | Modigliani F, Miller MH (1958) The cost of capital, corporation finance and the theory of investment. Am Econ Rev 48: 261–297. |
| [13] | Murphy KP (2022) Probabilistic machine learning: an introduction. MIT press. |
| [14] | Nagurney A (2009) Portfolio optimization. Advanced Management Development Program in Real Estate. |
| [15] | Packard T, Gentilini U, Grosh M, et al. (2019) Protecting all: Risk sharing for a diverse and diversifying world of work. World Bank Publications. |
| [16] | Pandi A (2020) Mean-semivariance approach for portfolio optimisation. |
| [17] | Rathi V, Kshirsagar M, Ryan C (2024) Enhancing portfolio performance: A random forest approach to volatility prediction and optimization. In ICAART, 1278–1285. |
| [18] |
García-Medina A, Rodríguez-Camejo B (2024) Random matrix theory and nested clustered optimization on high-dimensional portfolios. Int J Mod Phys C, 35: 1–19. https://doi.org/10.1142/S0129183124500980 doi: 10.1142/S0129183124500980
|
| [19] | Sdg U (2019) Sustainable development goals. Energy Progress Report, Tracking SDG, 7: 805–814. |
| [20] | Sharpe WF (1966) Mutual fund performance. J Bus 39: 119–138. |
| [21] | Siew LW, Jaaman SH, Hoe LW (2019) Mathematical modelling of risk in portfolio optimization with mean-gini approach. In Journal of Physics: Conference Series, 1212: 012031. IOP Publishing. |
| [22] |
Stiglingh ZC, Seitshiro MB (2022) Quantification of garch (1, 1) model misspecification with three known assumed error term distributions. J Financ Risk Manag 11: 549–578. https://doi.org/10.4236/jfrm.2022.113026 doi: 10.4236/jfrm.2022.113026
|
| [23] |
Sutiene K, Schwendner P, Sipos C, et al. (2024) Enhancing portfolio management using artificial intelligence: literature review. Front Artif Intell 7: 1371502. https://doi.org/10.3389/frai.2024.1371502 doi: 10.3389/frai.2024.1371502
|
| [24] | Tan JHJ, Kek SL (2020) A simulation optimization model for portfolio selection problem with quadratic programming technique. In AIP Conference Proceedings, 2266. AIP Publishing. |
| [25] | Uykun FN (2024) Machine learning applications in portfolio optimization. Master's thesis, Middle East Technical University. |
| [26] | Van Greunen J, Heymans A (2023) Determining the impact of different forms of stationarity on financial time series analysis. In Business Research: An Illustrative Guide to Practical Methodological Applications in Selected Case Studies, 61–76. Springer. |
| [27] |
Zanjirdar M (2020) Overview of portfolio optimization models. Adv Math financ Appl 5: 419–435. https://doi.org/10.22034/amfa.2020.1897346.1407 doi: 10.22034/amfa.2020.1897346.1407
|
| [28] | Zhang C, Sjarif NNA, Ibrahim R (2024) Deep learning models for price forecasting of financial time series: A review of recent advancements: 2020–2022. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 14: e1519. |