A screened predictive model for esophageal squamous cell carcinoma based on salivary flora data

Yunxiang Meng; Qihong Duan; Kai Jiao; Jiang Xue; Yunxiang Meng; Qihong Duan; Kai Jiao; Jiang Xue

doi:10.3934/mbe.2023816

Mathematical Biosciences and Engineering

2023, Volume 20, Issue 10: 18368-18385. doi: 10.3934/mbe.2023816

Previous Article Next Article

Research article Special Issues

A screened predictive model for esophageal squamous cell carcinoma based on salivary flora data

1.
School of Mathematics and Statistics, Xi'an JiaoTong University, Xi'an, China
2.
Department of Oral Mucosal Diseases, State Key Laboratory of Military Stomatology & National Clinical Research Center for Oral Diseases & Shaanxi Key Laboratory of Stomatology, School of Stomatology, The Fourth Military Medical University, Xi'an, China

Academic Editor: Qi Zhao

Received: 25 June 2023 Revised: 11 September 2023 Accepted: 18 September 2023 Published: 25 September 2023

Esophageal squamous cell carcinoma (ESCC) is a malignant tumor of the digestive system in the esophageal squamous epithelium. Many studies have linked esophageal cancer (EC) to the imbalance of oral microecology. In this work, different machine learning (ML) models including Random Forest (RF), Gaussian mixture model (GMM), K-nearest neighbor (KNN), logistic regression (LR), support vector machine (SVM) and extreme gradient boosting (XGBoost) based on Genetic Algorithm (GA) optimization was developed to predict the relationship between salivary flora and ESCC by combining the relative abundance data of Bacteroides, Firmicutes, Proteobacteria, Fusobacteria and Actinobacteria in the saliva of patients with ESCC and healthy control. The results showed that the XGBoost model without parameter optimization performed best on the entire dataset for ESCC diagnosis by cross-validation (Accuracy = 73.50%). Accuracy and the other evaluation indicators, including Precision, Recall, F1-score and the area under curve (AUC) of the receiver operating characteristic (ROC), revealed XGBoost optimized by the GA (GA-XGBoost) achieved the best outcome on the testing set (Accuracy = 89.88%, Precision = 89.43%, Recall = 90.75%, F1-score = 90.09%, AUC = 0.97). The predictive ability of GA-XGBoost was validated in phylum-level salivary microbiota data from ESCC patients and controls in an external cohort. The results obtained in this validation (Accuracy = 70.60%, Precision = 46.00%, Recall = 90.55%, F1-score = 61.01%) illustrate the reliability of the predictive performance of the model. The feature importance rankings obtained by XGBoost indicate that Bacteroides and Actinobacteria are the two most important factors in predicting ESCC. Based on these results, GA-XGBoost can predict and diagnose ESCC according to the relative abundance of salivary flora, providing an effective tool for the non-invasive prediction of esophageal malignancies.
- esophageal squamous cell carcinoma,
- oral microecology,
- machine learning,
- saliva,
- flora,
- XGBoost
Citation: Yunxiang Meng, Qihong Duan, Kai Jiao, Jiang Xue. A screened predictive model for esophageal squamous cell carcinoma based on salivary flora data[J]. Mathematical Biosciences and Engineering, 2023, 20(10): 18368-18385. doi: 10.3934/mbe.2023816

Related Papers:

Abstract

Esophageal squamous cell carcinoma (ESCC) is a malignant tumor of the digestive system in the esophageal squamous epithelium. Many studies have linked esophageal cancer (EC) to the imbalance of oral microecology. In this work, different machine learning (ML) models including Random Forest (RF), Gaussian mixture model (GMM), K-nearest neighbor (KNN), logistic regression (LR), support vector machine (SVM) and extreme gradient boosting (XGBoost) based on Genetic Algorithm (GA) optimization was developed to predict the relationship between salivary flora and ESCC by combining the relative abundance data of Bacteroides, Firmicutes, Proteobacteria, Fusobacteria and Actinobacteria in the saliva of patients with ESCC and healthy control. The results showed that the XGBoost model without parameter optimization performed best on the entire dataset for ESCC diagnosis by cross-validation (Accuracy = 73.50%). Accuracy and the other evaluation indicators, including Precision, Recall, F1-score and the area under curve (AUC) of the receiver operating characteristic (ROC), revealed XGBoost optimized by the GA (GA-XGBoost) achieved the best outcome on the testing set (Accuracy = 89.88%, Precision = 89.43%, Recall = 90.75%, F1-score = 90.09%, AUC = 0.97). The predictive ability of GA-XGBoost was validated in phylum-level salivary microbiota data from ESCC patients and controls in an external cohort. The results obtained in this validation (Accuracy = 70.60%, Precision = 46.00%, Recall = 90.55%, F1-score = 61.01%) illustrate the reliability of the predictive performance of the model. The feature importance rankings obtained by XGBoost indicate that Bacteroides and Actinobacteria are the two most important factors in predicting ESCC. Based on these results, GA-XGBoost can predict and diagnose ESCC according to the relative abundance of salivary flora, providing an effective tool for the non-invasive prediction of esophageal malignancies.

References

[1]	M. Arnold, I. Soerjomataram, J. Ferlay, D. Forman, Global incidence of oesophageal cancer by histological subtype in 2012, Gut, 64 (2015), 381–387. https://doi.org/10.1136/gutjnl-2014-308124 doi: 10.1136/gutjnl-2014-308124
[2]	E. J. Snider, G. Compres, D. E. Freedberg, H. Khiabanian, Y. R. Nobel, S. Stump, et al., Alterations to the Esophageal Microbiome Associated with Progression from Barrett's Esophagus to Esophageal Adenocarcinoma, Cancer Epidem. Biomar. Prev., 28 (2019), 1687–1693. https://doi.org/10.1158/1055-9965.EPI-19-0008 doi: 10.1158/1055-9965.EPI-19-0008
[3]	J. Zhao, Y. T. He, R. S. Zheng, S. W. Zhang, W. Q. Chen, Analysis of esophageal cancer time trends in China, 1989–2008, Asian Pac. J. Cancer Prev., 13 (2012), 4613–4617. https://doi.org/10.7314/apjcp.2012.13.9.4613 doi: 10.7314/apjcp.2012.13.9.4613
[4]	A. Q. Liu, E. Vogtmann, D. T. Shao, C. C. Abnet, H. Y. Dou, Y. Qin, et al., A Comparison of Biopsy and Mucosal Swab Specimens for Examining the Microbiota of Upper Gastrointestinal Carcinoma, Cancer Epidem. Biomar. Prev., 28 (2019), 2030–2037. https://doi.org/10.1158/1055-9965.EPI-18-1210 doi: 10.1158/1055-9965.EPI-18-1210
[5]	R. Lozano, M. Naghavi, K. Foreman, S. Lim, K. Shibuya, V. Aboyans, et al., Global and regional mortality from 235 causes of death for 20 age groups in 1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010, Lancet, 380 (2012), 2095–2128. https://doi.org/10.1016/S0140-6736(12)61728-0 doi: 10.1016/S0140-6736(12)61728-0
[6]	C. C. Abnet, M. Arnold, W. Q. Wei, Epidemiology of esophageal squamous cell carcinoma, Gastroenterol., 154 (2018), 360–373. https://doi.org/10.1053/j.gastro.2017.08.023 doi: 10.1053/j.gastro.2017.08.023
[7]	J. Lagergren, E. Smyth, D. Cunningham, P. Lagergren, Oesophageal cancer, Lancet, 390 (2017), 2383–2396. https://doi.org/10.1016/S0140-6736(17)31462-9 doi: 10.1016/S0140-6736(17)31462-9
[8]	C. C. Abnet, Y. L. Qiao, S. D. Mark, Z. W. Dong, P. R. Taylor, S. M. Dawsey, Prospective study of tooth loss and incident esophageal and gastric cancers in China, Cancer Causes Control, 12 (2001), 847–854. https://doi.org/10.1023/a:1012290009545 doi: 10.1023/a:1012290009545
[9]	N. A. Dar, F. Islami, G. A. Bhat, I. A. Shah, M. A. Makhdoomi, B. Iqbal, et al., Poor oral hygiene and risk of esophageal squamous cell carcinoma in Kashmir, Br. J. Cancer, 109 (2013), 1367–1372. https://doi.org/10.1038/bjc.2013.437 doi: 10.1038/bjc.2013.437
[10]	N. Guha, P. Boffetta, V. Wünsch Filho, J. Eluf Neto, O. Shangina, D. Zaridze, et al., Oral health and risk of squamous cell carcinoma of the head and neck and esophagus: results of two multicentric case-control studies, Am. J. Epidemiol., 166 (2007), 1159–1173. https://doi.org/10.1093/aje/kwm193 doi: 10.1093/aje/kwm193
[11]	S. Kageyama, T. Takeshita, M. Furuta, M. Tomioka, M. Asakawa, S. Suma, et al., Relationships of variations in the tongue microbiota and pneumonia mortality in nursing home residents, J. Gerontol. A, 73 (2018), 1097–1102. https://doi.org/10.1093/gerona/glx205 doi: 10.1093/gerona/glx205
[12]	K. E. Kholy, R. J. Genco, T. E. Dyke, Oral infections and cardiovascular disease, Trends Endocrin. Met., 26 (2015), 315–321. https://doi.org/10.1016/j.tem.2015.03.001 doi: 10.1016/j.tem.2015.03.001
[13]	E. Zaura, B. W. Brandt, A. Prodan, M. J. Teixeira de Mattos, S. Imangaliyev, J. Kool, et al., On the ecosystemic network of saliva in healthy young adults, ISME J., 11 (2017), 1218–1231. https://doi.org/10.1038/ismej.2016.199 doi: 10.1038/ismej.2016.199
[14]	R. Vasapolli, K. Schütte, C. Schulz, M. Vital, D. Schomburg, D. H. Pieper, et al., Analysis of transcriptionally active bacteria throughout the gastrointestinal tract of healthy individuals, Gastroenterology, 157 (2019), 1081–1092. https://doi.org/10.1053/j.gastro.2019.05.068 doi: 10.1053/j.gastro.2019.05.068
[15]	X. Cao, Intestinal inflammation induced by oral bacteria, Science, 358 (2017), 308–309. https://doi.org/10.1126/science.aap9298 doi: 10.1126/science.aap9298
[16]	B. Corning, A. P. Copland, J. W. Frye, The esophageal microbiome in health and disease, Curr. Gastroenterol. Rep., 20 (2018), 1–7. https://doi.org/10.1007/s11894-018-0642-9 doi: 10.1007/s11894-018-0642-9
[17]	X. Chen, B. Winckler, M. Lu, H. Cheng, Z. Yuan, Y. Yang, et al., Oral microbiota and risk for esophageal squamous cell carcinoma in a high-risk area of China, PloS One, 10 (2015), e0143603. https://doi.org/10.1371/journal.pone.0143603 doi: 10.1371/journal.pone.0143603
[18]	Z. Li, L. Dou, Y. Zhang, S. He, D. Zhao, C. Hao, et al., Characterization of the oral and esophageal microbiota in esophageal precancerous lesions and squamous cell carcinoma, Front. Cell. Infect. Microbiol., 11 (2021), 714162. https://doi.org/10.3389/fcimb.2021.714162 doi: 10.3389/fcimb.2021.714162
[19]	H. Li, Z. Lou, H. Zhang, N. Huang, D. Li, C. Luo, et al., Characteristics of oral microbiota in patients with esophageal cancer in China, BioMed Res. Int., 2021 (2021), 2259093. https://doi.org/10.1155/2021/2259093 doi: 10.1155/2021/2259093
[20]	M. Kawasaki, Y. Ikeda, E. Ikeda, M. Takahashi, D. Tanaka, Y. Nakajima, et al., Oral infectious bacteria in dental plaque and saliva as risk factors in patients with esophageal cancer, Cancer, 127 (2021), 512–519. https://doi.org/10.1002/cncr.33316 doi: 10.1002/cncr.33316
[21]	Q. Zhao, T. Yang, Y. Yan, Y. Zhang, Z. Li, Y. Wang, et al., Alterations of Oral microbiota in Chinese patients with esophageal cancer, Front. Cell. Infect. Microbiol., 10 (2020), 541144. https://doi.org/10.3389/fcimb.2020.541144 doi: 10.3389/fcimb.2020.541144
[22]	Q. Wang, Y. Rao, X. Guo, N. Liu, S. Liu, P. Wen, et al., Oral microbiome in patients with oesophageal squamous cell carcinoma, Sci. Rep., 9 (2019), 19055. https://doi.org/10.1038/s41598-019-55667-w doi: 10.1038/s41598-019-55667-w
[23]	F. Liu, M. Liu, Y. Liu, C. Guo, Y. Zhou, F. Li, et al., Oral microbiome and risk of malignant esophageal lesions in a high-risk area of China: A nested case-control study, Chinese J. Cancer Res., 32 (2020), 742–754. https://doi.org/10.21147/j.issn.1000-9604.2020.06.07 doi: 10.21147/j.issn.1000-9604.2020.06.07
[24]	B. A. Peters, J. Wu, Z. Pei, L. Yang, M. P. Purdue, N. D. Freedman, et al., Oral microbiome composition reflects prospective risk for esophageal cancers, Cancer Res., 77 (2017), 6777–6787. https://doi.org/10.1158/0008-5472.CAN-17-1296 doi: 10.1158/0008-5472.CAN-17-1296
[25]	W. Lv, Identification of the Microbial Composition of the Patients with Esophageal Squamous Cell Carcinoma and Analysis of the Differences in Microbial Composition from Healthy Subjects, Master thesis, Hebei Medical University in Shijiazhuang, 2021. https://doi.org/10.27111/d.cnki.ghyku.2021.000887
[26]	D. Shao, The Characteristic of Microbial Communities of Oral Cavity, Esophagus and Cardia of Population in High-Risk Regions of Esophageal Cancer in China, Ph.D thesis, Peking Union Medical College in Beijing, 2021. https://doi.org/10.27648/d.cnki.gzxhu.2021.000407
[27]	Y. Lu, Microbiota of the Tumor Tissue and Saliva in Patients with Esophageal Cancer, Ph.D thesis, Peking Union Medical College in Beijing, 2021.
[28]	K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V. Karamouzis, D. I. Fotiadis, Machine learning applications in cancer prognosis and prediction, Comput. Struct. Biotechnol. J., 13 (2014), 8–17. https://doi.org/10.1016/j.csbj.2014.11.005 doi: 10.1016/j.csbj.2014.11.005
[29]	R. Mofidi, C. Deans, M. D. Duff, A. C. de Beaux, S. Paterson Brown, Prediction of survival from carcinoma of oesophagus and oesophago-gastric junction following surgical resection using an artificial neural network, Eur. J. Surg. Oncol., 32 (2006), 533–539. https://doi.org/10.1016/j.ejso.2006.02.020 doi: 10.1016/j.ejso.2006.02.020
[30]	Y. Hayashida, K. Honda, Y. Osaka, T. Hara, T. Umaki, A. Tsuchida, et al., Possible prediction of chemoradiosensitivity of esophageal cancer by serum protein profiling, Clin. Cancer Res., 11 (2005), 8042–8047. https://doi.org/10.1158/1078-0432.CCR-05-0656 doi: 10.1158/1078-0432.CCR-05-0656
[31]	Z. Xun, Q. Zhang, T. Xu, N. Chen, F. Chen, Dysbiosis and ecotypes of the salivary microbiome associated with inflammatory bowel diseases and the assistance in diagnosis of diseases using oral bacterial profiles, Front. Microbiol., 9 (2018), 1136. https://doi.org/10.3389/fmicb.2018.01136 doi: 10.3389/fmicb.2018.01136
[32]	L. A. David, A. C. Materna, J. Friedman, M. I. Campos-Baptista, M. C. Blackburn, A. Perrotta, et al., Host lifestyle affects human microbiota on daily timescales, Genome Biol., 15 (2014), R89. https://doi.org/10.1186/gb-2014-15-7-r89 doi: 10.1186/gb-2014-15-7-r89
[33]	J. Wei, Analysis of Oral Salivary Microbiota in Patients with Esophageal Squamous Cell Carcinoma and its Clinical Significance, Master thesis, Southern Medical University in Canton, 2020. https://doi.org/10.27003/d.cnki.gojyu.2020.000723
[34]	Z. Zhu, Study on Risk Factors, Serum Biomarkers, and Salivary Microbiota of Upper Gastrointestinal Cancers, Ph.D thesis, Peking Union Medical College in Beijing, 2021. https://doi.org/10.27648/d.cnki.gzxhu.2021.000132
[35]	X. Wan, W. Wang, J. Liu, T. Tong, Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range, BMC Med. Res. Methodol., 14 (2014), 1–13. https://doi.org/10.1186/1471-2288-14-135 doi: 10.1186/1471-2288-14-135
[36]	F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, et al., Scikit-learn: machine learning in Python, preprint, arXiv: 1201.0490.
[37]	D. Krstajic, L. J. Buturovic, D. E. Leahy, S. Thomas, Cross-validation pitfalls when selecting and assessing regression and classification models, J. Cheminform., 6 (2014), 1–15. https://doi.org/10.1186/1758-2946-6-10 doi: 10.1186/1758-2946-6-10
[38]	G. Biau, E. Scornet, A Random Forest Guided Tour, Test, 25 (2016), 197–227. https://doi.org/10.1007/s11749-016-0481-7 doi: 10.1007/s11749-016-0481-7
[39]	F. Najar, S. Bourouis, N. Bouguila, S. Belghith, A comparison between different Gaussian-based mixture models, in 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA), (2017), 704–708. https://doi.org/10.1109/AICCSA.2017.108
[40]	I. Saini, D. Singh, A. Khosla, Delineation of ECG wave components using K-nearest neighbor (KNN) algorithm: ECG wave delineation using KNN, in 2013 10th International Conference on Information Technology: New Generations, (2013), 712–717. https://doi.org/10.1109/ITNG.2013.76
[41]	K. He, C. He, Housing price analysis using linear regression and logistic regression: A comprehensive explanation using melbourne real estate data, in 2021 IEEE International Conference on Computing (ICOCO), (2021), 241–246. https://doi.org/10.1109/ICOCO53166.2021.9673533
[42]	A. P. Bradley, The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recogn., 30 (1997), 1145–1159. https://doi.org/10.1016/s0031-3203(96)00142-2 doi: 10.1016/s0031-3203(96)00142-2
[43]	B. Pan, Application of XGBoost algorithm in hourly PM2.5 concentration prediction, IOP Conf. Ser.: Earth Environ. Sci., 113 (2018), 012127. https://doi.org/10.1088/1755-1315/113/1/012127 doi: 10.1088/1755-1315/113/1/012127
[44]	M. Moghtadaei, M. R. Golpayegani, F. Almasganj, A. Etemadi, M. R. Akbari, R. Malekzadeh, Predicting the risk of squamous dysplasia and esophageal squamous cell carcinoma using minimum classification error method, Comput. Biol. Med., 45 (2014), 51–57. https://doi.org/10.1016/j.compbiomed.2013.11.011 doi: 10.1016/j.compbiomed.2013.11.011

Reader Comments

Your name:*

Email:*
© 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)