Benchmarking alternative interpretable machine learning models for corporate probability of default

Michael Jacobs, Jr; Michael Jacobs, Jr

doi:10.3934/DSFE.2024001

Data Science in Finance and Economics

2024, Volume 4, Issue 1: 1-52. doi: 10.3934/DSFE.2024001

Previous Article Next Article

Research article Special Issues

Benchmarking alternative interpretable machine learning models for corporate probability of default

Michael Jacobs, Jr ^,

Ph.D, CFA, Senior Vice-President, Lead Quantitative Analytics & Modeling Expert, Head – C & I 1^st Line Model Development Validation & Quality Assurance, PNC Financial Services Group – Balance Sheet Analytics & Modeling/Model Development, 340 Madison Avenue, New York, N.Y. 10173, U.S.A

Received: 25 July 2023 Revised: 09 November 2023 Accepted: 11 December 2023 Published: 04 January 2024
JEL Codes: G28, G17, E47, G33

In this study we investigate alternative interpretable machine learning ("IML") models in the context of probability of default ("PD") modeling for the large corporate asset class. IML models have become increasingly prominent in highly regulated industries where there are concerns over the unintended consequences of deploying black box models that may be deemed conceptually unsound. In the context of banking and in wholesale portfolios, there are challenges around using models where the outcomes may not be explainable, both in terms of the business use case as well as meeting model validation standards. We compare various IML models (deep neural networks and explainable boosting machines), including standard approaches such as logistic regression, using a long and robust history of corporate borrowers. We find that there are material differences between the approaches in terms of dimensions such as model predictive performance and the importance or robustness of risk factors in driving outcomes, including conflicting conclusions depending upon the IML model and the benchmarking measure considered. These findings call into question the value of the modest pickup in performance with the IML models relative to a more traditional technique, especially if these models are to be applied in contexts that must meet supervisory and model validation standards.
- probability of default,
- credit risk,
- model validation, model risk,
- interpretable machine learning,
- deep neural networks
Citation: Michael Jacobs, Jr. Benchmarking alternative interpretable machine learning models for corporate probability of default[J]. Data Science in Finance and Economics, 2024, 4(1): 1-52. doi: 10.3934/DSFE.2024001

Related Papers:

Abstract

In this study we investigate alternative interpretable machine learning ("IML") models in the context of probability of default ("PD") modeling for the large corporate asset class. IML models have become increasingly prominent in highly regulated industries where there are concerns over the unintended consequences of deploying black box models that may be deemed conceptually unsound. In the context of banking and in wholesale portfolios, there are challenges around using models where the outcomes may not be explainable, both in terms of the business use case as well as meeting model validation standards. We compare various IML models (deep neural networks and explainable boosting machines), including standard approaches such as logistic regression, using a long and robust history of corporate borrowers. We find that there are material differences between the approaches in terms of dimensions such as model predictive performance and the importance or robustness of risk factors in driving outcomes, including conflicting conclusions depending upon the IML model and the benchmarking measure considered. These findings call into question the value of the modest pickup in performance with the IML models relative to a more traditional technique, especially if these models are to be applied in contexts that must meet supervisory and model validation standards.

References

[1]	Abdulrahman UFI, Panford JK, Hayfron-Acquah JB (2014) Fuzzy logic approach to credit scoring for micro finances in Ghana: a case study of KWIQPUS money lending. Int J Comput Appl 94: 11–18. https://doi.org/10.5120/16362-5772 doi: 10.5120/16362-5772
[2]	Allen L, Peng L, Shan Y (2020) Social networks and credit allocation on fintech lending platforms. working paper, social science research network. Available from: https://papers.ssrn.com/sol3/papers.cfm?abstract_id = 3537714.
[3]	Altman EI (1968) Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J Financ 23: 589–609. Available from: https://pdfs.semanticscholar.org/cab5/059bfc5bf4b70b106434e0cb665f3183fd4a.pdf.
[4]	Altman EI, Narayanan P (1997) An international survey of business failure classification models. in financial markets, institutions and instruments. New York: New York University Salomon Center, 6. Available from: https://onlinelibrary.wiley.com/doi/abs/10.1111/1468-0416.00010.
[5]	American Banking Association (2018) New credit score unveiled drawing on bank account data. ABA Bank J, October 22.
[6]	Anagnostou I, Kandhai D, Sanchez Rivero J, et al. (2020) Contagious defaults in a credit portfolio: a Bayesian network approach. J Credit Risk 16: 1–26. https://dx.doi.org/10.2139/ssrn.3446615 doi: 10.2139/ssrn.3446615
[7]	Bjorkegren D, Grissen D (2020) Behavior revealed in mobile phone usage predicts credit repayment. World Bank Econ Rev 34: 618–634. https://doi.org/10.1093/wber/lhz006 doi: 10.1093/wber/lhz006
[8]	Bonds D (1999) Modeling term structures of defaultable bonds. Rev Financ Stud 12: 687–720. Available from: https://academic.oup.com/rfs/article-abstract/12/4/687/1578719?redirectedFrom = fulltext.
[9]	Breeden J (2021) A survey of machine learning in credit risk. J Risk Model Validat 17: 1–62.
[10]	Chava S, Jarrow RA (2004) Bankruptcy prediction with industry effects. Rev Financ 8: 537–69. Available from: https://papers.ssrn.com/sol3/papers.cfm?abstract_id = 287474.
[11]	Clemen R (1989) Combining forecasts: a review and annotated bibliography. Int J Forecast 5: 559–583 Available from: https://people.duke.edu/~clemen/bio/Published%20Papers/13.CombiningReview-Clemen-IJOF-89.pdf.
[12]	Coats PK, Fant LF (1993) Recognizing financial distress patterns using a neural network tool. Financ Manage 22: 142–155. Available from: https://ideas.repec.org/a/fma/fmanag/coats93.html.
[13]	Collobert R, Weston J (2008) A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25^th International Conference on Neural Information Processing Systems, 1: 160–167. Association for Computing Machinery, New York. https://doi.org/10.1145/1390156.1390177
[14]	Duffie D, Singleton KJ (1999) Simulating correlated defaults. Paper presented at the Bank of England Conference on Credit Risk Modeling and Regulatory Implications Working Paper, Stanford University. Available from: https://kenneths.people.stanford.edu/sites/g/files/sbiybj3396/f/duffiesingleton1999.pdf.
[15]	Dwyer DW, Kogacil AE, Stein RM (2004) Moody's KMV RiskCalc^TM v2.1 Model. Moody's Analytics. Available from: https://www.moodys.com/sites/products/productattachments/riskcalc%202.1%20whitepaper.pdf.
[16]	Harrell FE Jr (2018) Road Map for Choosing Between Statistical Modeling and Machine Learning, Stat Think. Available from: http://www.fharrell.com/post/stat-ml.
[17]	Jacobs Jr M (2022a) Quantification of model risk with an application to probability of default estimation and stress testing for a large corporate portfolio. J Risk Model Validat 15: 1–39. https://doi.org/10.21314/JRMV.2022.023 doi: 10.21314/JRMV.2022.023
[18]	Jacobs Jr M (2022b) Validation of corporate probability of default models considering alternative use cases and the quantification of model risk. Data Sci Financ Econ 2: 17–53. https://doi.org/10.3934/DSFE.2022002 doi: 10.3934/DSFE.2022002
[19]	Jarrow RA, Turnbull SM (1995) Pricing derivatives on financial securities subject to credit risk. J Fnanc 50: 53–85. https://doi.org/10.1111/j.1540-6261.1995.tb05167.x doi: 10.1111/j.1540-6261.1995.tb05167.x
[20]	Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In Proceedings of the 25^th International Conference on Neural Information Processing Systems, 1: 1097–1105. Available from: https://cse.iitk.ac.in/users/cs365/2013/hw2/krizhevsky-hinton-12-imagenet-convolutional-NN-deep.pdf.
[21]	Kumar IE, Venkatasubramanian S, Scheidegger C, et al. (2020) Problems with Shapley-Value-Based Explanations as Feature Importance Measures. In International Conference on Machine Learning Research: 5491–5500. Available from: https://proceedings.mlr.press/v119/kumar20e.html.
[22]	Lessmann S, Baesens B, Seow HV, et al. (2015) Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. Europ J Oper Res 247: 124–136. https://doi.org/10.1016/j.ejor.2015.05.030 doi: 10.1016/j.ejor.2015.05.030
[23]	Li K, Niskanen J, Kolehmainen M, et al. (2016) Financial innovation: Credit default hybrid model for SME lending. Expert Syst Appl 61: 343–355. https://doi.org/10.1016/j.eswa.2016.05.029 doi: 10.1016/j.eswa.2016.05.029
[24]	Li X, Liu S, Li Z, et al. (2020) Flowscope: spotting money laundering based on graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, 34: 4731–4738. https://doi.org/10.1609/aaai.v34i04.5906
[25]	Lou Y, Caruana R, Gehrke J, et al. (2013) Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 623–631. https://dl.acm.org/doi/abs/10.1145/2487575.2487579
[26]	Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems, 4768–4777. Available from: https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767-Abstract.html.
[27]	Mckee TE (2000) Developing a bankruptcy prediction model via rough sets theory. Intell Syst Account Financ Manage 9: 159–173. https://doi.org/fv22ks
[28]	Merton RC (1974) On the pricing of corporate debt: The risk structure of interest rates. J Fnanc 29: 449–470.
[29]	Mester LJ (1997) What's the point of credit scoring? Federal Reserve Bank of Philadelphia. Bus Rev 3: 3–16. Available from: https://fraser.stlouisfed.org/files/docs/historical/frbphi/businessreview/frbphil_rev_199709.pdf.
[30]	Min JH, Lee YC (2005) Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters. Expert Syst Appl 28: 603–614 https://doi.org/10.1016/j.eswa.2004.12.008 doi: 10.1016/j.eswa.2004.12.008
[31]	Molnar C, König G, Herbinger J, et al. (2020) Pitfalls to avoid when interpreting machine learning models. Working Paper, University of Vienna. Available from: http://eprints.cs.univie.ac.at/6427/.
[32]	Odom MD, Sharda R (1990) A neural network model for bankruptcy prediction. In Joint Conference on Neural Networks, 163–168. IEEE Press, Piscataway, NJ. Available from: https://ieeexplore.ieee.org/abstract/document/5726669.
[33]	Opitz D, Maclin R (1999) Popular ensemble methods: An empirical study. J Artif Intell Res 11: 169–198.
[34]	Ribeiro MT, Singh S, Guestrin C (2016) "Why should I trust you?" Explaining the predictions of any classifier. In Proceedings of the 22^nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: 1135–1144. Available from: https://dl.acm.org/doi/abs/10.1145/2939672.2939778.
[35]	Rudin C (2019) Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell 1: 206–215. https://doi.org/10.1038/s42256-019-0048-x doi: 10.1038/s42256-019-0048-x
[36]	Slack D, Hilgard S, Jia E, et al. (2020) Fooling LIME and SHAP: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 180–186. https://dl.acm.org/doi/abs/10.1145/3375627.3375830
[37]	Sudjianto A, Knauth W, Rahul S, et al. (2020) Unwrapping the black box of deep ReLU networks: Interpretability, diagnostics, and simplification. arXiv preprin, Cornell University. https://arXiv.org/abs/2011.04041
[38]	Sudjianto A, Zhang A (2021) Designing inherently interpretable machine learning models. In Proceedings of ACM ICAIF 2021 Workshop on Explainable AI in Finance. ACM, New York. https://arXiv.org/abs/2111.01743
[39]	Sudjianto A, Zhang A, Yang Z, et al. (2023) PiML toolbox for interpretable machine learning model development and validation. arXiv preprint arXiv. https://doi.org/10.48550/arXiv.2305.04214
[40]	U.S. Banking Regulatory Agencies (2011) The U.S. Office of the Comptroller of the Currency and the Board of Governors of Federal Reserve System. SR 11-7/OCC11-12: Supervisory Guidance on Model Risk Management. Washington, D.C. Available from: https://www.federalreserve.gov/supervisionreg/srletters/sr1107a1.pdf.
[41]	U.S. Banking Regulatory Agencies (2021) The U.S. Office of Comptroller of the Currency, the Board of Governors of the Federal Reserve System, the Federal Deposit Insurance Corporation, the Consumer Financial Protection Bureau, and the National Credit Union Administration. Request for Information and Comment on Financial Institutions' Use of Artificial Intelligence, Including Machine Learning. Washington, D.C. Available from: https://www.federalregister.gov/documents/2021/03/31/2021-06607/requestfor-information-and-comment-on-financial-institutions-use-of-artificialintelligence.
[42]	U.S. Office of the Comptroller of the Currency (2021) Comptroller's Handbook on Model Risk Management. Washington, D.C. Available from: https://www.occ.treas.gov/publicationsand-resources/publications/comptrollers-handbook/files/model-riskmanagement/index-model-risk-management.html.
[43]	Vahid PR, Ahmadi A (2016) Modeling corporate customers' credit risk considering the ensemble approaches in multiclass classification: evidence from Iranian corporate credits. J Credit Risk 12: 71–95.
[44]	Vassiliou PC (2013) Fuzzy semi-Markov migration process in credit risk. Fuzzy Sets and Syst 223: 39–58. https://doi.org/10.1016/j.fss.2013.02.016 doi: 10.1016/j.fss.2013.02.016
[45]	Yang Z, Zhang A, Sudjianto A (2021a) Enhancing explainability of neural networks through architecture constraints. IEEE T Neur Net Learn Syst 32: 2610–2621. https://doi.org/10.1109/TNNLS.2020.3007259 doi: 10.1109/TNNLS.2020.3007259
[46]	Yang Z, Zhang A, Sudjianto A (2021) GAMI-Net: An explainable neural network based on generalized additive models with structured interactions. Pattern Recogn 120: 108192. https://doi.org/10.1016/j.patcog.2021.10819 doi: 10.1016/j.patcog.2021.10819
[47]	Zhu Y, Xie C, Wang G J, et al. (2017) Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China's SME credit risk in supply chain finance. Neural Comput Appl 28: 41–50. https://doi.org/10.1007/s00521-016-2304-x doi: 10.1007/s00521-016-2304-x

Reader Comments

Your name:*

Email:*
© 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)