Research article

Comparison of proportional hazards model and random survival forest in lung cancer survival prediction

  • Published: 10 November 2025
  • This study develops and compares a traditional Cox proportional hazards nomogram with a machine learning-based random survival forest (RSF) model to predict 1-, 3-, and 5-year overall survival in lung cancer patients using data from the SEER database ($ N $ = 8695). Variable selection is conducted through univariate screening ($ p < $ 0.05), multivariate Cox regression, and RSF variable importance (VIMP) analysis, identifying key prognostic factors, such as TNM stage, tumor size, chemotherapy status, and patient age. Compared with the Cox proportional hazards model, the RSF model has a slightly lower concordance index (C-index), but its time-dependent area under the curve (AUC) values are all higher. This indicates that the RSF model is more accurate in prediction at specific time points, confirming its clinical utility. Both models also exhibit good calibration. These interpretable prognostic tools provide valuable support for clinical decision making, enabling improved risk stratification and personalized survival estimation. The findings underscore the potential of machine learning approaches to enhance predictive accuracy in oncology.

    Citation: Ceshu Zheng, Zhongqiang Liu. Comparison of proportional hazards model and random survival forest in lung cancer survival prediction[J]. Big Data and Information Analytics, 2025, 9: 136-151. doi: 10.3934/bdia.2025007

    Related Papers:

  • This study develops and compares a traditional Cox proportional hazards nomogram with a machine learning-based random survival forest (RSF) model to predict 1-, 3-, and 5-year overall survival in lung cancer patients using data from the SEER database ($ N $ = 8695). Variable selection is conducted through univariate screening ($ p < $ 0.05), multivariate Cox regression, and RSF variable importance (VIMP) analysis, identifying key prognostic factors, such as TNM stage, tumor size, chemotherapy status, and patient age. Compared with the Cox proportional hazards model, the RSF model has a slightly lower concordance index (C-index), but its time-dependent area under the curve (AUC) values are all higher. This indicates that the RSF model is more accurate in prediction at specific time points, confirming its clinical utility. Both models also exhibit good calibration. These interpretable prognostic tools provide valuable support for clinical decision making, enabling improved risk stratification and personalized survival estimation. The findings underscore the potential of machine learning approaches to enhance predictive accuracy in oncology.



    加载中


    [1] Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, et al. (2021) Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71: 209–249. https://doi.org/10.3322/caac.21660 doi: 10.3322/caac.21660
    [2] Zhu J, Shi H, Ran H, Lai Q, Shao Y, Wu Q, (2022) Development and validation of a nomogram for predicting overall survival in patients with second primary small cell lung cancer after non-small cell lung cancer: A SEER-based study. Int J Gen Med 15: 3613–3624. https://doi.org/10.2147/IJGM.S353045 doi: 10.2147/IJGM.S353045
    [3] Alexander M, Wolfe R, Ball D, Conron M, Stirling RG, Solomon B, et al. (2017) Lung cancer prognostic index: A risk score to predict overall survival after the diagnosis of non-small-cell lung cancer. Br J Cancer 117: 744–751. https://doi.org/10.1038/bjc.2017.232 doi: 10.1038/bjc.2017.232
    [4] Devaux A, Helmer C, Genuer R, Proust-Lima C, (2023) Random survival forests with multivariate longitudinal endogenous covariates. Stat Methods Med Res 32: 2331–2346. https://doi.org/10.1177/09622802231206477 doi: 10.1177/09622802231206477
    [5] Churpek MM, Yuen TC, Winslow C, Meltzer DO, Kattan MW, Edelson DP, (2016) Multicenter comparison of machine learning methods and conventional regression for predicting clinical deterioration on the wards. Crit Care Med 44: 368–374. https://doi.org/10.1097/CCM.0000000000001571 doi: 10.1097/CCM.0000000000001571
    [6] Feng Y, Qiao H, Han X, Tang H, (2024) A prognostic nomogram of non-small cell lung cancer based on tumor marker inflammatory nutrition score. Transl Lung Cancer Res 13: 3392–3406. https://doi.org/10.21037/tlcr-24-708 doi: 10.21037/tlcr-24-708
    [7] Guo H, Nie G, Zhao X, Liu J, Yu K, Li Y, (2024) A nomogram for cancer-specific survival of lung adenocarcinoma patients: A SEER based analysis. Surg Open Sci 22: 13–23. https://doi.org/10.1016/j.sopen.2024.10.003 doi: 10.1016/j.sopen.2024.10.003
    [8] Tong Y, Cui Y, Jiang L, Pi Y, Gong Y, Zhao D, (2022) Clinical characteristics, prognostic factor and a novel dynamic prediction model for overall survival of elderly patients with chondrosarcoma: A population-based study. Front Public Health 10: 901680. https://doi.org/10.3389/fpubh.2022.901680 doi: 10.3389/fpubh.2022.901680
    [9] Kaplan EL, Meier P, (1985) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53: 457–481. https://doi.org/10.1080/01621459.1958.10501452 doi: 10.1080/01621459.1958.10501452
    [10] Efron B, Hastie T, (2016) Computer Age Statistical Inference, Cambridge: Cambridge University Press, 131–139. https://doi.org/10.1017/CBO9781316576533
    [11] Wolbers M, Koller MT, Witteman JC, Steyerberg EW, (2009) Prognostic models with competing risks: Methods and application to coronary risk prediction. Epidemiology 20: 555–561. https://doi.org/10.1097/EDE.0b013e3181a39056 doi: 10.1097/EDE.0b013e3181a39056
    [12] Liu W, Zhou L, Zhao D, Wu X, Yue F, Yang H, et al. (2022) Development and validation of a prognostic nomogram in lung cancer with obstructive sleep apnea syndrome. Front Med 9: 810907. https://doi.org/10.3389/fmed.2022.810907 doi: 10.3389/fmed.2022.810907
    [13] Bade BC, Dela Cruz CS, (2020) Lung cancer 2020: Epidemiology, etiology, and prevention. Clin Chest Med 41: 1–24. https://doi.org/10.1016/j.ccm.2019.10.001 doi: 10.1016/j.ccm.2019.10.001
    [14] Liao CY, Chen YM, Wu YT, Chao HS, Chiu HY, Wang TW, et al. (2024) Personalized prediction of immunotherapy response in lung cancer patients using advanced radiomics and deep learning. Cancer Imaging 24: 129. https://doi.org/10.1186/s40644-024-00779-4 doi: 10.1186/s40644-024-00779-4
    [15] Wang H, Shen L, Geng J, Wu Y, Xiao H, Zhang F, et al. (2018) Prognostic value of cancer antigen-125 for lung adenocarcinoma patients with brain metastasis: a random survival forest prognostic model. Sci Rep 8: 5670. https://doi.org/10.1038/s41598-018-23946-7 doi: 10.1038/s41598-018-23946-7
    [16] Roshanaei G, Safari M, Faradmal J, Abbasi M, Khazaei S, Factors affecting the survival of patients with colorectal cancer using random survival forest. J Gastrointest Cancer 53: 64–71. https://doi.org/10.1007/s12029-020-00544-3
    [17] Liu L, Shi M, Wang Z, Lu H, Li C, Tao Y, et al. (2018) A molecular and staging model predicts survival in patients with resected non-small cell lung cancer. BMC Cancer 18: 966. https://doi.org/10.1186/s12885-018-4881-9 doi: 10.1186/s12885-018-4881-9
    [18] Hu A, Chen Z, Liu C, Gao Y, Deng C, Liu X, (2022) Incidence and prognosis nomogram of small solitary lung cancer ($\leq$2 cm) with extra-thoracic metastasis at initial diagnosis: A population-based study. Cancer Control 29: 10732748221141560. https://doi.org/10.1177/10732748221141560 doi: 10.1177/10732748221141560
    [19] Roshanaei G, Safari M, Faradmal J, Abbasi M, Khazaei S, (2022) Factors affecting the survival of patients with colorectal cancer using random survival forest. J Gastrointest Cancer 53: 64–71. https://doi.org/10.1007/s12029-020-00544-3 doi: 10.1007/s12029-020-00544-3
    [20] Xia K, Chen D, Jin S, Yi X, Luo L, (2023) Prediction of lung papillary adenocarcinoma-specific survival using ensemble machine learning models. Sci Rep 13: 14827. https://doi.org/10.1038/s41598-023-40779-1 doi: 10.1038/s41598-023-40779-1
    [21] Finke I, Behrens G, Weisser L, Brenner H, Jansen L, (2018) Socioeconomic differences and lung cancer survival-systematic review and meta-analysis. Front Oncol 8: 536. https://doi.org/10.3389/fonc.2018.00536 doi: 10.3389/fonc.2018.00536
    [22] Yang S, Zhang Z, Wang Q, (2019) Emerging therapies for small cell lung cancer. J Hematol Oncol 12: 47. https://doi.org/10.1186/s13045-019-0736-3 doi: 10.1186/s13045-019-0736-3
    [23] Armstrong SA, Liu SV, (2018) Immune checkpoint inhibitors in small cell lung cancer: A partially realized potential. Adv Ther 36: 1826–1832. https://doi.org/10.1007/s12325-019-01008-2 doi: 10.1007/s12325-019-01008-2
    [24] Luo J, Hu J, Mulati Y, Wu Z, Lai C, Kong D, et al. (2024) Developing and validating a nomogram for penile cancer survival: A comprehensive study based on SEER and Chinese data. Cancer Med 13: e7111. https://doi.org/10.1002/cam4.7111 doi: 10.1002/cam4.7111
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(536) PDF downloads(22) Cited by(0)

Article outline

Figures and Tables

Figures(10)  /  Tables(2)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog