Linked to poverty, rheumatic heart disease (RHD) disproportionately burdens the developing world, receiving less attention than other infectious diseases. Resampling and cost-sensitive learning techniques are applied to predict the mortality risk of imbalanced RHD datasets. A total of 57 models were constructed, and was comprised of 50 resampled machine learning (ML) models and 7 cost-sensitive learning models. The results from the Friedman and Nemenyi tests highlight the superior performance of the cost-sensitive support vector classification model, with an AUC of 0.888, sensitivity of 0.800, G-means of 0.806, and a Brier score of 0.061. The global and local interpretability are advanced through two post-hoc interpretable ML methods, facilitating the prioritization of key features associated with mortality risk, the determination of thresholds for features, and a comprehension of how variations in these features influence patient mortality rates. These findings may prove to be clinically valuable, assisting clinicians in tailoring precise management that is essential to maximize the survival of RHD patients.
Citation: Yiwen Tao, Zhenqiang Zhang, Bengbeng Wang, Jingli Ren. Motality prediction of ICU rheumatic heart disease with imbalanced data based on machine learning[J]. Big Data and Information Analytics, 2024, 8: 43-64. doi: 10.3934/bdia.2024003
Linked to poverty, rheumatic heart disease (RHD) disproportionately burdens the developing world, receiving less attention than other infectious diseases. Resampling and cost-sensitive learning techniques are applied to predict the mortality risk of imbalanced RHD datasets. A total of 57 models were constructed, and was comprised of 50 resampled machine learning (ML) models and 7 cost-sensitive learning models. The results from the Friedman and Nemenyi tests highlight the superior performance of the cost-sensitive support vector classification model, with an AUC of 0.888, sensitivity of 0.800, G-means of 0.806, and a Brier score of 0.061. The global and local interpretability are advanced through two post-hoc interpretable ML methods, facilitating the prioritization of key features associated with mortality risk, the determination of thresholds for features, and a comprehension of how variations in these features influence patient mortality rates. These findings may prove to be clinically valuable, assisting clinicians in tailoring precise management that is essential to maximize the survival of RHD patients.
| [1] |
Marijon E, Mirabel M, Celermajer DS, Jouven X, (2012) Rheumatic heart disease. Lancet 379: 953–964. https://doi.org/10.1016/S0140-6736(11)61171-9 doi: 10.1016/S0140-6736(11)61171-9
|
| [2] |
Carapetis JR, Beaton A, Cunningham MW, Guilherme L, Karthikeyan G, Mayosi BM, et al. (2016) Acute rheumatic fever and rheumatic heart disease. Nat Rev Dis Primers 2: 15084. https://doi.org/10.1038/nrdp.2015.84 doi: 10.1038/nrdp.2015.84
|
| [3] |
Muhamed B, Parks T, Sliwa K, (2020) Genetics of rheumatic fever and rheumatic heart disease. Nat Rev Cardiol 17: 145–154. https://doi.org/10.1038/s41569-019-0258-2 doi: 10.1038/s41569-019-0258-2
|
| [4] |
Ordunez P, Martinez R, Soliz P, Giraldo G, Mujica OJ, Nordet P, et al. (2019) Rheumatic heart disease burden, trends, and inequalities in the Americas, 1990–2017: A population-based study. Lancet Global Health 7: e1388–e1397. https://doi.org/10.1016/S2214-109X(19)30360-2 doi: 10.1016/S2214-109X(19)30360-2
|
| [5] |
Watkins DA, Johnson CO, Colquhoun SM, Karthikeyan G, Beaton A, Bukhman G, et al. (2017) Global, regional, and national burden of rheumatic heart disease, 1990–2015. N Engl J Med 377: 713–722. https://doi.org/10.1056/NEJMoa1603693 doi: 10.1056/NEJMoa1603693
|
| [6] | Xie J, Su B, Li C, Lin K, Li H, Hu Y, et al. (2017) A review of modeling methods for predicting in-hospital mortality of patients in intensive care unit. J Emerg Crit Care Med 1: 1–10. |
| [7] |
Rehman S, Akhtar N, Saba N, Munir S, Ahmed W, Mohyuddin A, et al. (2013) A study on the association of TNF-α-308, IL-6-174, IL-10-1082 and IL-1RaVNTR gene polymorphisms with rheumatic heart disease in Pakistani patients. Cytokine 61: 527–531. https://doi.org/10.1016/j.cyto.2012.10.020 doi: 10.1016/j.cyto.2012.10.020
|
| [8] |
Dooley LM, Ahmad TB, Pandey M, Good MF, Kotiw M, (2021) Rheumatic heart disease: A review of the current status of global research activity. Autoimmun Rev 20: 102740. https://doi.org/10.1016/j.autrev.2020.102740 doi: 10.1016/j.autrev.2020.102740
|
| [9] |
Arvind B, Ramakrishnan S, (2020) Rheumatic fever and rheumatic heart disease in children. Indian J Pediatr 87: 305–311. https://doi.org/10.1007/s12098-019-03128-7 doi: 10.1007/s12098-019-03128-7
|
| [10] | Xu Y, Han D, Huang T, Zhang X, Lu H, Shen S, et al. (2022) Predicting ICU mortality in rheumatic heart disease: Comparison of XGBoost and logistic regression, 9: 847206. https://doi.org/10.3389/fcvm.2022.847206 |
| [11] |
Lee YW, Choi JW, Shin EH, (2021) Machine learning model for predicting malaria using clinical information. Comput Biol Med 129: 104151. https://doi.org/10.1016/j.compbiomed.2020.104151 doi: 10.1016/j.compbiomed.2020.104151
|
| [12] |
Akter S, Das D, Haque RU, Tonmoy MIQ, Hasan MR, Mahjabeen S, et al. (2022) AD-CovNet: An exploratory analysis using a hybrid deep learning model to handle data imbalance, predict fatality, and risk factors in Alzheimer's patients with COVID-19. Comput Biol Med 146: 105657. https://doi.org/10.1016/j.compbiomed.2022.105657 doi: 10.1016/j.compbiomed.2022.105657
|
| [13] |
Fan Z, Jiang J, Xiao C, Chen Y, Xia Q, Wang J, et al. (2023) Construction and validation of prognostic models in critically Ill patients with sepsis-associated acute kidney injury: Interpretable machine learning approach. J Transl Med 21: 406. https://doi.org/10.1186/s12967-023-04205-4 doi: 10.1186/s12967-023-04205-4
|
| [14] |
Martins JFB, Nascimento ER, Nascimento BR, Sable CA, Beaton AZ, Ribeiro AL, et al. (2021) Towards automatic diagnosis of rheumatic heart disease on echocardiographic exams through video-based deep learning. J Am Med Inf Assoc 28: 1834–1842. https://doi.org/10.1093/jamia/ocab061 doi: 10.1093/jamia/ocab061
|
| [15] |
Ali F, Hasan B, Ahmad H, Hoodbhoy Z, Bhuriwala Z, Hanif M, et al. (2021) Protocol: Detection of subclinical rheumatic heart disease in children using a deep learning algorithm on digital stethoscope: A study protocol. BMJ Open 11: e044070. https://doi.org/10.1136/bmjopen-2020-044070 doi: 10.1136/bmjopen-2020-044070
|
| [16] |
Katarya R, Meena SK, (2021) Machine learning techniques for heart disease prediction: A comparative study and analysis. Health Technol 11: 87–97. https://doi.org/10.1007/s12553-020-00505-7 doi: 10.1007/s12553-020-00505-7
|
| [17] |
Shahid S, Khurram H, Billah B, Akbar A, Shehzad MA, Shabbir MF, (2022) Machine learning methods for predicting major types of rheumatic heart diseases in children of Southern Punjab, Pakistan. Front. Cardiovasc. Med 9: 996225. https://doi.org/10.3389/fcvm.2022.996225 doi: 10.3389/fcvm.2022.996225
|
| [18] |
Thabtah F, Hammoud S, Kamalov F, Gonsalves A, (2020) Data imbalance in classification: Experimental evaluation. Inf Sci 513: 429–441. https://doi.org/10.1016/j.ins.2019.11.004 doi: 10.1016/j.ins.2019.11.004
|
| [19] |
Ghorbani M, Kazi A, Baghshah MS, Rabiee HR, Navab N, (2022) RA-GCN: Graph convolutional network for disease prediction problems with imbalanced data. Med Image Anal 75: 102272. https://doi.org/10.1016/j.media.2021.102272 doi: 10.1016/j.media.2021.102272
|
| [20] |
Razzaghi T, Safro I, Ewing J, Sadrfaridpour E, Scott JD, (2019) Predictive models for bariatric surgery risks with imbalanced medical datasets. Ann Oper Res 280: 1–18. https://doi.org/10.1007/s10479-019-03156-8 doi: 10.1007/s10479-019-03156-8
|
| [21] |
Pera M, Gibert J, Gimeno M, Garsot E, Eizaguirre E, Miró M, et al. (2022) Machine learning risk prediction model of 90-day mortality after gastrectomy for cancer. Ann Surgery 276: 776–783. https://doi.org/10.1097/SLA.0000000000005616 doi: 10.1097/SLA.0000000000005616
|
| [22] |
Ghamari SH, Abbasi-Kangevari M, Saeedi Moghaddam, S, Aminorroaya A, Rezaei N, Shobeiri P, et al. (2022) Rheumatic heart disease is a neglected disease relative to its burden worldwide: Findings from global burden of disease 2019. J Am Heart Association 11: e025284. https://doi.org/10.1161/JAHA.122.025284 doi: 10.1161/JAHA.122.025284
|
| [23] |
Tao Y, Zhao J, Cui H, Liu L, He L, (2024) Exploring the impact of socioeconomic and natural factors on pulmonary tuberculosis incidence in China (2013–2019) using explainable machine learning: A nationwide study. Acta Trop 253: 107176. https://doi.org/10.1016/j.actatropica.2024.107176 doi: 10.1016/j.actatropica.2024.107176
|
| [24] | Lundberg SM, Lee SI, (2017) A unified approach to interpreting model predictions. Adv Neural Inf Proc Syst 2017: 30. |
| [25] | Ribeiro MT, Singh S, Guestrin C, (2016) "Why should I trust you?" Explaining the predictions of any classifier, In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144. https://doi.org/10.1145/2939672.2939778 |
| [26] | Friedman M, (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32: 675–701. |
| [27] | Nemenyi PB, (1963) Distribution-Free Multiple Comparisons, Princeton University. |
| [28] | Sharma S, Bellinger C, Krawczyk B, Zaiane O, Japkowicz N, (2018) Synthetic oversampling with the majority class: A new perspective on handling extreme imbalance, In: 2018 IEEE International Conference on Data Mining (ICDM), 447–456. https://doi.org/10.1109/ICDM.2018.00060 |
| [29] | Han H, Wang WY, Mao BH, (2005) Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning, In: International Conference on Intelligent Computing, 878–887. https://doi.org/10.1007/11538059_91 |
| [30] | He H, Bai Y, Garcia EA, Li S, (2008) ADASYN: Adaptive synthetic sampling approach for imbalanced learning, In: 2008 IEEE International Joint Conference on Neural Networks, 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969 |
| [31] | Chawla NV, (2010) Data mining for imbalanced datasets: An overview, In: Maimon O, Rokach L, (eds) Data Mining and Knowledge Discovery Handbook, Boston: Springer, 565–601. https://doi.org/10.1007/978-0-387-09823-4_45 |
| [32] | Fan W, Stolfo SJ, Zhang J, Chan PK, (1999) AdaCost: misclassification cost-sensitive boosting, In: Proceeding of 16th International Conference on Machine Learning, 99: 97–105. |
| [33] |
Marijon E, Mocumbi A, Narayanan K, Jouven X, Celermajer DS, (2021) Persisting burden and challenges of rheumatic heart disease, Eur Heart J 42: 3338–3348. https://doi.org/10.1093/eurheartj/ehab407 doi: 10.1093/eurheartj/ehab407
|
| [34] |
He VY, Condon JR, Ralph AP, Zhao Y, Roberts K, de Dassel JL, et al. (2016) Long-term outcomes from acute rheumatic fever and rheumatic heart disease: A data-linkage and survival analysis approach. Circulation 134: 222–232. https://doi.org/10.1161/CIRCULATIONAHA.115.020966 doi: 10.1161/CIRCULATIONAHA.115.020966
|
| [35] |
Lawrence JG, Carapetis JR, Griffiths K, Edwards K, Condon JR, (2013) Acute rheumatic fever and rheumatic heart disease: Incidence and progression in the Northern Territory of Australia, 1997 to 2010. Circulation, 128: 492–501. https://doi.org/10.1161/CIRCULATIONAHA.113.001477 doi: 10.1161/CIRCULATIONAHA.113.001477
|
| [36] | Liu J, Wang D, Xiong Y, Liu B, Lin J, Zhang S, et al. (2017) Association between coagulation function and cerebral microbleeds in ischemic stroke patients with atrial fibrillation and/or rheumatic heart disease. Aging Dis 8: 131. https://doi.org/10.14336%2FAD.2016.0715 |
| [37] |
Arrigo M, Jessup M, Mullens W, Reza N, Shah AM, Sliwa K, et al. (2020) Acute heart failure. Nat Rev Dis Primers 6: 16. https://doi.org/10.1038/s41572-020-0151-7 doi: 10.1038/s41572-020-0151-7
|
| [38] | Pradhan RR, Jha A, Nepal G, Sharma M, (2018) Rheumatic heart disease with multiple systemic emboli: A rare occurrence in a single subject. Cureus 10: 7. https://doi.org/10.7759%2Fcureus.2964 |
| [39] |
DeBakey ME (1971) Left ventricular bypass pump for cardiac assistance: clinical experience. Am J Cardiol 27: 3–11. https://doi.org/10.1016/0002-9149(71)90076-2 doi: 10.1016/0002-9149(71)90076-2
|
| [40] | Mickerson J, Swale J, (1959) Diuretic effect of steroid therapy in obstinate heart failure. Br Med J 1: 876. https://doi.org/10.1136%2Fbmj.1.5126.876 |
| [41] |
Janssens U, Dujardin R, Graf J, Lepper W, Ortlepp J, Merx M, et al. (2001) Value of SOFA (Sequential Organ Failure Assessment) score and total maximum SOFA score in 812 patients with acute cardiovascular disorders. Crit Care 5: 1. https://doi.org/10.1186/cc1292 doi: 10.1186/cc1292
|
| [42] |
McClave SA, Snider HL, Spain DA, (1999) Preoperative issues in clinical nutrition. Chest 115: 64S–70S. https://doi.org/10.1378/chest.115.suppl_2.64S doi: 10.1378/chest.115.suppl_2.64S
|
| [43] |
Evans AS, Hosseinian L, Mohabir T, Kurtis S, Mechanick JI, (2015) Nutrition and the cardiac surgery intensive care unit patient—An update. J Cardiothorac Vasc Anesth 29: 1044–1050. https://doi.org/10.1053/j.jvca.2015.03.021 doi: 10.1053/j.jvca.2015.03.021
|
| [44] |
Horwich TB, Kalantar-Zadeh K, MacLellan RW, Fonarow GC, (2008) Albumin levels predict survival in patients with systolic heart failure. Am Heart J 155: 883–889. https://doi.org/10.1016/j.ahj.2007.11.043 doi: 10.1016/j.ahj.2007.11.043
|
| [45] |
Uthamalingam S, Kandala J, Daley M, Patvardhan E, Capodilupo R, Moore SA, et al. (2010) Serum albumin and mortality in acutely decompensated heart failure. Am Heart J 160: 1149–1155. https://doi.org/10.1016/j.ahj.2010.09.004 doi: 10.1016/j.ahj.2010.09.004
|
| [46] |
Don BR, Kaysen G, (2004) Poor nutritional status and inflammation: Serum albumin: Relationship to inflammation and nutrition. Semin Dial 17: 432–437. https://doi.org/10.1111/j.0894-0959.2004.17603.x doi: 10.1111/j.0894-0959.2004.17603.x
|