Financial institutions often hesitate to use complex models such as random forests and extreme gradient boosting (XGBoost) for credit risk assessments due to challenges in selecting the optimal hyperparameters and the interpretability of these 'black box' models. This study addresses these issues by comparing the effects of classical grid search and Bayesian hyperparameter optimisation techniques on models' performance and interpretability in credit risk modelling. The results indicate that Bayesian optimisation improves recall for XGBoost, making it more effective at identifying defaulters while offering no notable advantage for random forests and even reducing performance for logistic regression. Although Bayesian optimisation reduces the computational time required for finding optimal hyperparameters, it does not improve the models' discriminatory power (area under the curve). The findings also suggest that different optimisation techniques influence feature importance and ranking, with SHapley Additive exPlanation (SHAP) values revealing that slight hyperparameter adjustments can lead to substantial changes in feature importance, particularly for logistic regression. However, partial dependence plots (PDP) for the variable Rate under Bayesian and classically optimised models are similar, indicating that using classical or Bayesian optimisation techniques does not alter the relationship between features and default probability. These insights emphasise the importance of selecting an appropriate optimisation approach to balance models' performance and explainability, with significant implications for credit risk modelling decisions.
Citation: Tatenda Shoko, Tanja Verster, Lindani Dube. Comparative analysis of classical and Bayesian optimisation techniques: Impact on model performance and interpretability in credit risk modelling using SHAP and PDPs[J]. Data Science in Finance and Economics, 2025, 5(3): 320-354. doi: 10.3934/DSFE.2025014
Financial institutions often hesitate to use complex models such as random forests and extreme gradient boosting (XGBoost) for credit risk assessments due to challenges in selecting the optimal hyperparameters and the interpretability of these 'black box' models. This study addresses these issues by comparing the effects of classical grid search and Bayesian hyperparameter optimisation techniques on models' performance and interpretability in credit risk modelling. The results indicate that Bayesian optimisation improves recall for XGBoost, making it more effective at identifying defaulters while offering no notable advantage for random forests and even reducing performance for logistic regression. Although Bayesian optimisation reduces the computational time required for finding optimal hyperparameters, it does not improve the models' discriminatory power (area under the curve). The findings also suggest that different optimisation techniques influence feature importance and ranking, with SHapley Additive exPlanation (SHAP) values revealing that slight hyperparameter adjustments can lead to substantial changes in feature importance, particularly for logistic regression. However, partial dependence plots (PDP) for the variable Rate under Bayesian and classically optimised models are similar, indicating that using classical or Bayesian optimisation techniques does not alter the relationship between features and default probability. These insights emphasise the importance of selecting an appropriate optimisation approach to balance models' performance and explainability, with significant implications for credit risk modelling decisions.
| [1] | Abhishek K, Abdelaziz M (2023) Machine Learning for Imbalanced Data: Tackle Imbalanced Datasets Using Machine Learning and Deep Learning Techniques. Packt Publishing Limited. |
| [2] | Agusta ZP (2019) Modified balanced random forest for improving imbalanced data prediction. Int J Ad Intell Inf 5: 58–65. |
| [3] |
Ahmed Arafa AH, Radad M, Badawy MM, et al. (2022) Logistic regression hyperparameter optimization for cancer classification. Menoufia J Electron Eng Res 31: 1–8. https://doi.org/10.21608/mjeer.2021.70512.1034 doi: 10.21608/mjeer.2021.70512.1034
|
| [4] | Akiba T, Sano S, Yanase T, et al. (2019) Optuna: A next-generation hyperparameter optimization framework, In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, 2623–2631. https://doi.org/10.1145/3292500.333070 |
| [5] |
Alonso Robisco A, Carbo Martinez JM (2022) Measuring the model risk-adjusted performance of machine learning algorithms in credit default prediction. Financ Innov 8: 70. https://doi.org/10.1186/s40854-022-00366-1 doi: 10.1186/s40854-022-00366-1
|
| [6] |
Altman EI (1968) Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J Financ 23: 589–609. https://doi.org/10.2307/2978933 doi: 10.2307/2978933
|
| [7] |
Antony TM, Suresh G (2023) Determinants of credit risk: Empirical evidence from Indian commercial banks. Banks Bank Syst 18: 88–100. https://doi.org/10.21511/bbs.18(2).2023.08 doi: 10.21511/bbs.18(2).2023.08
|
| [8] | Basel Committee on Banking Supervision (1999) Principles for the management of credit risk. |
| [9] |
Bentéjac C, Csörgő A, Martínez-Muñoz G (2021) A comparative analysis of gradient boosting algorithms. Artif Intell Rev 54: 1937–1967. https://doi.org/10.1007/s10462-020-09896-5 doi: 10.1007/s10462-020-09896-5
|
| [10] | Bergstra J, Bardenet R, Bengio Y, et al. (2011) Algorithms for hyper-parameter optimization. Adv Neur Info Process Syst 24. |
| [11] | Bertrand A, Eagan JR, Maxwell W, et al. (2024) Ai is entering regulated territory: Understanding the supervisors' perspective for model justifiability in financial crime detection. In: Proceedings of the CHI Conference on Human Factors in Computing Systems, 1–21. |
| [12] |
Breiman L (2001) Random forests. Mach learn 45: 5–32. https://doi.org/10.1023/A:1010933404324 doi: 10.1023/A:1010933404324
|
| [13] |
Bussmann N, Giudici P, Marinelli D, et al. (2021) Explainable machine learning in credit risk management. Computat Econ 57: 203–216. https://doi.org/10.1007/s10614-020-10042-0 doi: 10.1007/s10614-020-10042-0
|
| [14] |
Caton S, Malisetty S, Haas C (2022) Impact of imputation strategies on fairness in machine learning. J Artif Intell Res 74: 1011–1035. https://doi.org/10.1613/jair.1.13197 doi: 10.1613/jair.1.13197
|
| [15] |
Chen KS (2024) Interlinkages between bitcoin, green financial assets, oil, and emerging stock markets. Data Sci Financ Econ 4: 160–187. https://doi.org/10.3934/DSFE.2024006 doi: 10.3934/DSFE.2024006
|
| [16] | Chen T, Guestrin C (2016) Xgboost: A Scalable Tree Boosting System, In: Proceedings of the 22nd acm sigkdd International Conference on Knowledge Discovery and Data Mining, 785–794. |
| [17] |
De Lange PE, Melsom B, Vennerød CB, et al. (2022) Explainable ai for credit assessment in banks. J Risk Financ Manage 15: 556. https://doi.org/10.3390/jrfm15120556 doi: 10.3390/jrfm15120556
|
| [18] |
Desalegn G, Tangl A, Boros A (2024) The mediating role of customer attitudes in the linkage between e-commerce and the digital economy. Natl Account Rev 6: 245–265. https://doi.org/10.3934/NAR.2024011 doi: 10.3934/NAR.2024011
|
| [19] | Du Toit HA, Schutte WD, Raubenheimer H (2024) Shapley values as an interpretability technique in credit scoring. J Risk Model Validat. https://doi.org/10.21314/JRMV.2023.010 |
| [20] |
Dube L, Verster T (2023) Enhancing classification performance in imbalanced datasets: A comparative analysis of machine learning models. Data Sci Financ Econ 3: 354–379. https://doi.org/10.3934/DSFE.2023021 doi: 10.3934/DSFE.2023021
|
| [21] |
Dube L, Verster T (2024) Interpretability of the random forest model under class imbalance. Data Sci Financ Econ 4: 446–468. https://doi.org/10.3934/DSFE.2024019 doi: 10.3934/DSFE.2024019
|
| [22] |
Elhassan T, Aljurf M (2016) Classification of imbalance data using tomek link (t-link) combined with random under-sampling (rus) as a data reduction method. Global J Technol Optim S 1: 2016. https://doi.org/10.4172/2229-8711.S1:111 doi: 10.4172/2229-8711.S1:111
|
| [23] | Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 1189–1232. |
| [24] | Gatla TR (2023) Machine learning in credit risk assessment: Analyzing how machine learning models are transforming the assessment of credit risk for loans and credit cards. J Emerg Techno Innov Res 10: 746–750. |
| [25] |
Gawde-Prabhudesai S, Patil S, Kamat P, et al. (2024) Explainable predictive maintenance of rotating machines using lime, shap, pdp, ice. IEEE Access 12: 29345–29361. https://doi.org/10.1109/ACCESS.2024.3367110 doi: 10.1109/ACCESS.2024.3367110
|
| [26] |
Hand DJ, Henley WE (1997) Statistical classification methods in consumer credit scoring: a review. J R Stat Soc A 160: 523–541. https://doi.org/10.1111/j.1467-985X.1997.00078.x doi: 10.1111/j.1467-985X.1997.00078.x
|
| [27] | Hastie T, Tibshirani R, Friedman JH, et al. (2009) The elements of statistical learning: data mining, inference, and prediction, 2. Springer. https://doi.org/10.1007/978-0-387-21606-5 |
| [28] | He H, Bai Y, Garcia E, et al. (2008) Adasyn: Adaptive synthetic sampling approach for imbalanced learning, In: Proceedings of the International Joint Conference on Neural Networks, 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969 |
| [29] | Helmy A, Elnaghy S, Ramadan N (2023) Predicting unsettled debts in imbalanced data using resampling methods. In: 2023 Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS), 337–344. IEEE. https://doi.org/10.1109/10.1109/ICICIS58388.2023.10391162 |
| [30] |
Hu L, Chen J, Vaughan J, et al. (2020) Supervised machine learning techniques: An overview with applications to banking. Int Stat Rev 89: 573–604. https://doi.org/10.1111/insr.12448 doi: 10.1111/insr.12448
|
| [31] | Kaggle (2021) Loan applicant data for credit risk analysis dataset. |
| [32] |
Khan MS, Peng T, Akhlaq H, et al. (2025) Comparative analysis of automated machine learning for hyperparameter optimization and explainable artificial intelligence models. IEEE Access 13: 84966–84991. https://doi.org/10.1109/ACCESS.2025.3566427 doi: 10.1109/ACCESS.2025.3566427
|
| [33] | Kłosok M, Chlebus M (2020) Towards better understanding of complex machine learning models using Explainable Artificial Intelligence (XAI): Case of Credit Scoring modelling. University of Warsaw, Faculty of Economic Sciences Warsaw. |
| [34] | Kong Y, Wang Y, Sun S, et al. (2023) XGB and SHAP credit scoring model based on Bayesian optimization. J Comput Electronic Inf Manage 10: 46–53. |
| [35] |
Li Z, Lai Q, He J (2024) Does digital technology enhance the global value chain position? Borsa Istanb Rev 24: 856–868. https://doi.org/10.1016/j.bir.2024.04.016 doi: 10.1016/j.bir.2024.04.016
|
| [36] | Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 30. |
| [37] | Masís S (2021) Interpretable Machine Learning with Python: Learn to build interpretable high-performance models with hands-on real-world examples. Packt Publishing Ltd. |
| [38] | McNeil AJ, Frey R, Embrechts P (2015) Quantitative Risk Management: Concepts, Techniques and Tools-Revised Edition, Princeton University Press. |
| [39] | Molnar C (2020) Interpretable Machine Learning. Leanpub. |
| [40] | Murphy KP (2022) Probabilistic Machine Learning: An introduction. MIT Press. |
| [41] | Overisch M (2020) A conceptual explanation of Bayesian model-based hyperparameter optimization for machine learning. |
| [42] | Owen L (2022) Hyperparameter Tuning with Python: Boost your machine learning model's performance via hyperparameter tuning. Packt Publishing Ltd. |
| [43] | Pedregosa F, Varoquaux G, Gramfort A, et al. (2011) Scikit-learn: Machine Learning in Python. J Mach Learn Res 12: 2825–2830. |
| [44] |
Rajesh PK, Shreyanth S, Sarveshwaran R (2023) Enhanced credit card fraud detection: A novel approach integrating Bayesian optimized random forest classifier with advanced feature analysis and real-time data adaptation. Int J Innov Eng Manage Res 12: 537–561. https://doi.org/10.48047/IJIEMR/V12/ISSUE05/52 doi: 10.48047/IJIEMR/V12/ISSUE05/52
|
| [45] | Rodemann J, Croppi F, Arens P, et al. (2024) Explaining bayesian optimization by shapley values facilitates human-ai collaboration. arXiv preprint. https://doi.org/10.48550/arXiv.2403.04629 |
| [46] |
Rodríguez-Pérez R, Bajorath J (2019) Interpretation of compound activity predictions from complex machine learning models using local approximations and shapley values. J Med Chem 63: 8761–8777. https://doi.org/10.1021/acs.jmedchem.9b01101 doi: 10.1021/acs.jmedchem.9b01101
|
| [47] |
Shahriari B, Swersky K, Wang Z, et al. (2016) Taking the human out of the loop: A review of Bayesian Optimization. P IEEE 104: 148–175. https://doi.org/10.1109/JPROC.2015.2494218 doi: 10.1109/JPROC.2015.2494218
|
| [48] | Shapley LS (1953) A value for n-person games, Contributions to the Theory of Games 2, 28: 307–317. |
| [49] |
Swana EF, Doorsamy W, Bokoro P (2022) Tomek link and smote approaches for machine fault classification with an imbalanced dataset. Sensors 22: 3246. https://doi.org/10.3390/s22093246 doi: 10.3390/s22093246
|
| [50] | Wang Y, Ni XS (2019) A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization. arXiv preprint. https://doi.org/10.48550/arXiv.1901.08433 |
| [51] |
Wu J, Chen XY, Zhang H, et al (2019) Hyperparameter optimization for machine learning models based on Bayesian optimization. J Electron Sci Technol 17: 26–40. https://doi.org/10.11989/JEST.1674-862X.80904120 doi: 10.11989/JEST.1674-862X.80904120
|
| [52] | Xia Y, Liu C, Li Y, et al. (2017) A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Syst Appl 78: 225–241. |
| [53] |
Xie Y, Li X, Ngai E, et al. (2009) Customer churn prediction using improved balanced random forests. Expert Syst Appl 36: 5445–5449. https://doi.org/10.1016/j.eswa.2008.06.121 doi: 10.1016/j.eswa.2008.06.121
|
| [54] | Xu T (2024) Comparative Analysis of Machine Learning Algorithms for Consumer Credit Risk assessment. In: 7th International Conference on Computer Engineering, Information Science Application Technology 4: 60–67. https://doi.org/10.62051/r1m3pg16 |
| [55] | Yang J, Yin H (2022) Application of Bayesian optimization and stacking integration in personal credit delinquency prediction, In: CS & IT Conference Proceedings, 12. |
| [56] | Zharikova O, Pashchenko O, Smalyuh M (2023) Ensuring effective management of the credit portfolio of a commercial bank in the conditions the modern crisis. Bioecon J 14: 46–66. |