Research article Special Issues

Deep learning framework for early diagnosis of lung cancer using multi- modal medical imaging

  • Published: 18 December 2025
  • MSC : 62P10, 68T07, 90B50

  • Early and accurate diagnosis of lung cancer remains challenging due to the heterogeneity of tumor morphology and the variability across imaging modalities. This study proposed a deep learning framework that integrated computed tomography (CT), positron emission tomography/computed tomography (PET/CT), and chest X-ray (CXR) within a unified multi-modal transformer architecture for early lung cancer detection. The framework employed modality-specific encoders combining convolutional and state-space blocks to extract spatial-frequency representations, followed by a gated cross-modal fusion transformer designed to align heterogeneous features and handle missing modalities through mixture-of-experts routing and low-rank imputation. Multi-task heads were jointly optimized for nodule detection, segmentation, malignancy classification, and survival risk prediction. Explainability was embedded through concept bottlenecks, prototype reasoning, gradient-based attribution, and counterfactual concept editing, offering case-level interpretability and clinically meaningful evidence maps. Uncertainty was estimated via Monte-Carlo dropout, deep ensembles, and temperature scaling to ensure calibrated confidence estimates and defer-to-expert safety decisions. Lung image database consortium and image database resource initiative (LIDC-IDRI) (CT), the cancer imaging archive (TCIA) (PET/CT), and national lung screening trial (NLST) (CXR) benchmark datasets revealed that our methods work better than the best methods available. The proposed technique yielded Dice scores of 0.879, 0.872, and 0.876, together with AUC values of 0.944, 0.952, and 0.938, and an expected calibration error (ECE) of 0.02 across all modalities. Under domain shift, cross-dataset analysis showed substantial generalization ($ AUC > 0.92 $). A generalizable framework for multi-modal diagnostics made it possible to use AI to help with lung cancer screening in a way that was clear, trustworthy, and scalable.

    Citation: Masad A. Alrasheedi, Asamh Saleh M. Al Luhayb, Abdulmajeed A. R. Alharbi. Deep learning framework for early diagnosis of lung cancer using multi- modal medical imaging[J]. AIMS Mathematics, 2025, 10(12): 29815-29852. doi: 10.3934/math.20251310

    Related Papers:

  • Early and accurate diagnosis of lung cancer remains challenging due to the heterogeneity of tumor morphology and the variability across imaging modalities. This study proposed a deep learning framework that integrated computed tomography (CT), positron emission tomography/computed tomography (PET/CT), and chest X-ray (CXR) within a unified multi-modal transformer architecture for early lung cancer detection. The framework employed modality-specific encoders combining convolutional and state-space blocks to extract spatial-frequency representations, followed by a gated cross-modal fusion transformer designed to align heterogeneous features and handle missing modalities through mixture-of-experts routing and low-rank imputation. Multi-task heads were jointly optimized for nodule detection, segmentation, malignancy classification, and survival risk prediction. Explainability was embedded through concept bottlenecks, prototype reasoning, gradient-based attribution, and counterfactual concept editing, offering case-level interpretability and clinically meaningful evidence maps. Uncertainty was estimated via Monte-Carlo dropout, deep ensembles, and temperature scaling to ensure calibrated confidence estimates and defer-to-expert safety decisions. Lung image database consortium and image database resource initiative (LIDC-IDRI) (CT), the cancer imaging archive (TCIA) (PET/CT), and national lung screening trial (NLST) (CXR) benchmark datasets revealed that our methods work better than the best methods available. The proposed technique yielded Dice scores of 0.879, 0.872, and 0.876, together with AUC values of 0.944, 0.952, and 0.938, and an expected calibration error (ECE) of 0.02 across all modalities. Under domain shift, cross-dataset analysis showed substantial generalization ($ AUC > 0.92 $). A generalizable framework for multi-modal diagnostics made it possible to use AI to help with lung cancer screening in a way that was clear, trustworthy, and scalable.



    加载中


    [1] R. L. Siegel, A, N. Giaquinto, A. Jemal, Cancer statistics, 2024, CA-Cancer J. Clin., 74 (2024), 12–49. https://doi.org/10.3322/caac.21820
    [2] D. Ardila, A. P. Kiraly, S. Bharadwaj, B. Choi, J. J. Reicher, L. Peng, et al., End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography, Nat. Med., 25 (2019), 954–961. https://doi.org/10.1038/s41591-019-0447-x doi: 10.1038/s41591-019-0447-x
    [3] C. Gao, L. Y. Wu, W. Wu, Y. C. Huang, X. Y. Wang, Z. C. Sun, et al., Deep learning in pulmonary nodule detection and segmentation: A systematic review, Eur. Radiol., 35 (2025), 255–266. https://doi.org/10.1007/s00330-024-10907-0 doi: 10.1007/s00330-024-10907-0
    [4] D. G. Shen, G. R. Wu, H.-I. Suk, Deep learning in medical image analysis, Ann. Rev. Biomed. Eng., 19 (2017), 221–248. https://doi.org/10.1146/annurev-bioeng-071516-044442 doi: 10.1146/annurev-bioeng-071516-044442
    [5] R. Wang, Y. S. Zhang, J. T. Yang, TransPND: A transformer-based pulmonary nodule diagnosis method on CT image, Pattern Recognition and Computer Vision (PRCV 2022), Cham: Springer, 2022,348–360. https://doi.org/10.1007/978-3-031-18910-4_29
    [6] Y. Y. Chen, D. K. Chen, X. H. Liu, H. Jiang, X. M. Wang, Deep learning-driven multimodal integration of miRNA and radiomic for lung cancer diagnosis, Biosensors, 15 (2025), 610. https://doi.org/10.3390/bios15090610 doi: 10.3390/bios15090610
    [7] S. Bhosekar, P. Singh, D. Garg, V. Ravi, M. Diwakar, A review of deep learning-based multi-modal medical image fusion, The Open Bioinformatics Journal, 18 (2025), E18750362370697. http://doi.org/10.2174/0118750362370697250630063814 doi: 10.2174/0118750362370697250630063814
    [8] Z. Sadeghi, R. Alizadehsani, M. A. Cifci, S. Kausar, R. Rehman, P. Mahanta, et al., A review of explainable artificial intelligence in healthcare, Comput. Electr. Eng., 118 (2024), 109370. https://doi.org/10.1016/j.compeleceng.2024.109370 doi: 10.1016/j.compeleceng.2024.109370
    [9] P. W. Koh, T. Nguyen, Y. S. Tang, S. Mussmann, E. Pierson, B. Kim, et al., Concept bottleneck models, Proceedings of the 37th International Conference on Machine Learning, PMLR, 119 2020, 5338–5348.
    [10] S. N. Saw, Y. Y. Yan, K. H. Ng, Current status and future directions of explainable artificial intelligence in medical imaging, Eur. J. Radiol., 183 (2025), 111884. https://doi.org/10.1016/j.ejrad.2024.111884 doi: 10.1016/j.ejrad.2024.111884
    [11] C.-L. Lin, H. Y. Liu, C. W. K. Lo, Uniqueness principle for fractional (non)-coercive anisotropic polyharmonic operators and applications to inverse problems, Inverse Probl. Imag., 19 (2025), 795–815. https://doi.org/10.3934/ipi.2024054 doi: 10.3934/ipi.2024054
    [12] P. C. Meng, Z. B. Xu, X. C. Wang, W. S. Yin, H. Y. Liu, A novel method for solving the inverse spectral problem with incomplete data, J. Comput. Appl. Math., 463 (2025), 116525. https://doi.org/10.1016/j.cam.2025.116525 doi: 10.1016/j.cam.2025.116525
    [13] J. B. Zhuang, P. C. Meng, W. S. Yin, A stable neural network for inverse scattering problems with contaminated data, Knowl.-Based Syst., 310 (2025), 113001. https://doi.org/10.1016/j.knosys.2025.113001 doi: 10.1016/j.knosys.2025.113001
    [14] A. Thaljaoui, S. N. Yousafzai, I. M. Nasir, O. Saidani, E. Fadhal, T. Saidani, Explainable skin cancer diagnosis with parallel attention mechanism for segmentation and classification, Biomed. Signal Proces., 113 (2026), 109159. https://doi.org/10.1016/j.bspc.2025.109159 doi: 10.1016/j.bspc.2025.109159
    [15] I. M. Nasir, S. Tehsin, R. Damaševičius, R. Maskeliūnas, Integrating explanations into CNNs by adopting spiking attention block for skin cancer detection, Algorithms, 17 (2024), 557. https://doi.org/10.3390/a17120557 doi: 10.3390/a17120557
    [16] M. J. Abbas, H. Alshaya, W. Bouchelligua, N. Hassan, I. M. Nasir, Hierarchical multi-stage attention and dynamic expert routing for explainable gastrointestinal disease diagnosis, Diagnostics, 15 (2025), 2714. https://doi.org/10.3390/diagnostics15212714 doi: 10.3390/diagnostics15212714
    [17] I. M. Nasir, M. A. Alrasheedi, N. A. Alreshidi, MFAN: Multi-feature attention network for breast cancer classification, Mathematics, 12 (2024), 3639. https://doi.org/10.3390/math12233639 doi: 10.3390/math12233639
    [18] S. Tehsin, I. M. Nasir, R. Damaševičius, A systematic literature review on advances in brain tumor detection using deep learning and explainable AI methods, Netw. Model. Anal. Health Inform. Bioinforma., 14 (2025), 154. https://doi.org/10.1007/s13721-025-00658-3 doi: 10.1007/s13721-025-00658-3
    [19] S. Tehsin, H. Alshaya, W. Bouchelligua, I. M. Nasir, Hybrid state-space and vision transformer framework for fetal ultrasound plane classification in prenatal diagnostics, Diagnostics, 15 (2025), 2879. https://doi.org/10.3390/diagnostics15222879 doi: 10.3390/diagnostics15222879
    [20] I. M. Nasir, I. M. Nasir, R. Damaševičius, GATransformer: A graph attention network-based transformer model to generate explainable attentions for brain tumor detection, Algorithms, 18 (2025), 89. https://doi.org/10.3390/a18020089 doi: 10.3390/a18020089
    [21] D. S. Malik, T. Shah, S. Tehsin, I. M. Nasir, N. L. Fitriyani, M. Syafrudin, Block cipher nonlinear component generation via hybrid pseudo-random binary sequence for image encryption, Mathematics, 12 (2024), 2302. https://doi.org/10.3390/math12152302 doi: 10.3390/math12152302
    [22] S. N. Yousafzai, I. M. Nasir, S. Tehsin, N. L. Fitriyani, M. Syafrudin, FLTrans-Net: Transformer-based feature learning network for wheat head detection, Comput. Electron. Agr., 229 (2025), 109706. https://doi.org/10.1016/j.compag.2024.109706 doi: 10.1016/j.compag.2024.109706
    [23] S. Tehsin, A. Hassan, F. Riaz, I. M. Nasir, N. L. Fitriyani, M. Syafrudin, Enhancing signature verification using triplet siamese similarity networks in digital documents, Mathematics, 12 (2024), 2757. https://doi.org/10.3390/math12172757 doi: 10.3390/math12172757
    [24] D. R. Aberle, A. M. Adams, C. D. Berg, W. C. Black, J. D. Clapp, R. M. Fagerstrom, et al., Reduced lung-cancer mortality with low-dose computed tomographic screening, N. Engl. J. Med., 365 (2011), 395–409. https://doi.org/10.1056/NEJMoa1102873 doi: 10.1056/NEJMoa1102873
    [25] A. A. A. Setio, A. Traverso, T. de Bel, M. S. N. Berens, C. van den Bogaard, et al., Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules: the LUNA16 challenge, Med. Image Anal., 42 (2017), 1–13. https://doi.org/10.1016/j.media.2017.06.015 doi: 10.1016/j.media.2017.06.015
    [26] T.-Y. Lin, P. Dollár, R. Girshick, K. M. He, B. Hariharan, S. Belongie, Feature pyramid networks for object detection, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017,936–944. https://doi.org/10.1109/CVPR.2017.106
    [27] Z. Liu, Y. T. Lin, Y. Cao, H. Hu, Y. X. Wei, Z. Zhang, et al., Swin transformer: Hierarchical vision transformer using shifted windows, 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 2021, 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986
    [28] A. Taleb, C. Lippert, T. Klein, M. Nabi, Multimodal self-supervised learning for medical image analysis, Information Processing in Medical Imaging (IPMI 2021), Cham: Springer, 2021,661–673. https://doi.org/10.1007/978-3-030-78191-0_51
    [29] Z. W. Zhou, M. R. Siddiquee, N. Tajbakhsh, J. M. Liang, UNet++: Redesigning skip connections to exploit multiscale features in image segmentation, IEEE T. Med. Imaging, 39 (2020), 1856–1867. https://doi.org/10.1109/TMI.2019.2959609 doi: 10.1109/TMI.2019.2959609
    [30] M. Zubair, M. Hussain, M. A. Al-Bashrawi, M. Bendechache, M. Owais, A comprehensive review of techniques, algorithms, advancements, challenges, and clinical applications of multi-modal medical image fusion for improved diagnosis, Comput. Meth. Prog. Bio., 272 (2025), 109014. https://doi.org/10.1016/j.cmpb.2025.109014 doi: 10.1016/j.cmpb.2025.109014
    [31] Q. Y. Hu, K. Li, C. H. Yang, Y. Wang, R. Huang, M. Q. Gu, et al., The role of artificial intelligence based on PET/CT radiomics in NSCLC: disease management, opportunities, and challenges, Front. Oncol., 13 (2023), 1133164. https://doi.org/10.3389/fonc.2023.1133164 doi: 10.3389/fonc.2023.1133164
    [32] I. Iqbal, K. Walayat, M. U. Kakar, J. W. Ma, Automated identification of human gastrointestinal tract abnormalities based on deep convolutional neural network with endoscopic images, Intelligent Systems with Applications, 16 (2022), 200149. https://doi.org/10.1016/j.iswa.2022.200149 doi: 10.1016/j.iswa.2022.200149
    [33] I. Iqbal, M. Younus, K. Walayat, M. U. Kakar, J. W. Ma, Automated multi-class classification of skin lesions through deep convolutional neural network with dermoscopic images, Comput. Med. Imag. Grap., 88 (2021), 101843. https://doi.org/10.1016/j.compmedimag.2020.101843 doi: 10.1016/j.compmedimag.2020.101843
    [34] Y. Xiao, Z. X. Jiang, P. C. Meng, W. S. Yin, D. Q. Qi, L. H. Zhou, Local manifold approximation of dynamical system based on neural ordinary differential equation, Physica D, 477 (2025), 134688. https://doi.org/10.1016/j.physd.2025.134688 doi: 10.1016/j.physd.2025.134688
    [35] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: Visual explanations from deep networks via gradient-based localization, 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 2017,618–626. https://doi.org/10.1109/ICCV.2017.74
    [36] M. Sundararajan, A. Taly, Q. Q. Yan, Axiomatic attribution for deep networks, Proceedings of the 34th International Conference on Machine Learning, Sydney: JMLR, 2017, 3319–3328.
    [37] C. F. Chen, O. Li, C. Tao, A. Barnett, C. Rudin, J. K. Su, This looks like that: deep learning for interpretable image recognition, Advances in Neural Information Processing Systems, 32 (2019), 1–12.
    [38] M. T. Ribeiro, S. Singh, C. Guestrin, "Why should I trust you?": Explaining the predictions of any classifier, Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, San Diego: Association for Computational Linguistics, 2016, 97–101. https://doi.org/10.18653/v1/N16-3020
    [39] B. Lakshminarayanan, A. Pritzel, C. Blundell, Simple and scalable predictive uncertainty estimation using deep ensembles, Advances in Neural Information Processing Systems (NeurIPS), 30 (2017), 6402–6413. doi: 10.48550/arXiv.1612.01474
    [40] D. R. Cox, Regression models and life-tables, J. R. Stat. Soc. B, 34 (1972), 187–202. https://doi.org/10.1111/j.2517-6161.1972.tb00899.x doi: 10.1111/j.2517-6161.1972.tb00899.x
    [41] Ö. Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, O. Ronneberger, 3D U-Net: Learning dense volumetric segmentation from sparse annotation, Medical Image Computing and Computer-Assisted Intervention (MICCAI), Cham: Springer, 2016,424–432. https://doi.org/10.1007/978-3-319-46723-8_49
    [42] F. Milletari, N. Navab, S.-A. Ahmadi, V-Net: Fully convolutional neural networks for volumetric medical image segmentation, 2016 Fourth International Conference on 3D Vision (3DV), 2016,565–571. https://doi.org/10.48550/arXiv.1606.04797
    [43] A. Hatamizadeh, V. Nath, Y. C. Tang, D. Yang, H. R. Roth, D. G. Xu, Swin UNETR: Swin transformers for semantic segmentation of brain tumors in MRI images, Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, Cham: Springer, 2022,272–284. https://doi.org/10.1007/978-3-031-08999-2_22
    [44] F. Isensee, P. F. Jaeger, S. A. A. Kohl, J. Petersen, K. H. Maier-Hein, nnU-Net: A self-adapting framework for U-net-based medical image segmentation, Nat. Methods, 18 (2021), 203–211. https://doi.org/10.1038/s41592-020-01008-z doi: 10.1038/s41592-020-01008-z
    [45] D. Jha, P. H. Smedsrud, M. A. Riegler, D. Johansen, T. D. Lange, P. Halvorsen, ResUNet++: An advanced architecture for medical image segmentation, 2019 IEEE International Symposium on Multimedia (ISM), San Diego, CA, USA, 2019,225–2255. https://doi.org/10.1109/ISM46123.2019.00049
    [46] A. Sanaat, I. Shiri, H. Arabi, I. Mainta, R. Nkoulou, H. Zaidi, Deep learning-assisted ultra-fast/low-dose whole-body PET/CT imaging, Eur. J. Nucl. Med. Mol. Imaging, 48 (2021), 2405–2415. https://doi.org/10.1007/s00259-020-05167-1 doi: 10.1007/s00259-020-05167-1
    [47] M. Amini, M. Nazari, I. Shiri, G. Hajianfar, M. R. Deevband, H. Abdollahi, et al., Multi-level multi-modality (PET and CT) fusion radiomics: prognostic modeling for non-small cell lung carcinoma, Phys. Med. Biol., 66 (2021), 205017. https://doi.org/10.1088/1361-6560/ac287d doi: 10.1088/1361-6560/ac287d
    [48] H. Chen, Y. X. Qi, Y. Yin, T. X. Li, X. Q. Liu, X. L. Li, et al., MMFNet: A multi-modality MRI fusion network for segmentation of nasopharyngeal carcinoma, Neurocomputing, 394 (2020), 27–40. https://doi.org/10.1016/j.neucom.2020.02.002 doi: 10.1016/j.neucom.2020.02.002
    [49] H. Z. Zheng, D. Shao, Z. X. Huang, Y. F. Yang, H. R. Zheng, D. Liang, et al., Automatic dual-modality breast tumor segmentation in PET/CT images using CT-guided transformer, Med. Phys., 52 (2025), e70136. https://doi.org/10.1002/mp.70136 doi: 10.1002/mp.70136
    [50] X. Y. Zhao, X. Wang, W. Xia, R. Zhang, J. M. Jian, J. Y. Zhang, et al., 3D multi-scale, multi-task, and multi-label deep learning for prediction of lymph node metastasis in T1 lung adenocarcinoma patients' CT images, Comput. Med. Imag. Grap., 93 (2021), 101987. https://doi.org/10.1016/j.compmedimag.2021.101987 doi: 10.1016/j.compmedimag.2021.101987
    [51] P. Rajpurkar, J. Irvin, K. Zhu, B. Yang, H. Mehta, T. Duan, et al., CheXNet: Radiologist-level pneumonia detection on chest X-rays with deep learning, 2017, arXiv: 1711.05225. https://doi.org/10.48550/arXiv.1711.05225
    [52] M. X. Tan, Q. Le, EfficientNet: Rethinking model scaling for convolutional neural networks, Proceedings of the 36th International Conference on Machine Learning, 97 (2019), 6105–6114. doi: 10.48550/arXiv.1905.11946
    [53] Z. Liu, H. Z. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, S. N. Xie, A ConvNet for the 2020s, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 2022, 11966–11976. https://doi.org/10.1109/CVPR52688.2022.01167
    [54] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. H. Zhai, T. Unterthiner, et al., An image is worth 16x16 words: transformers for image recognition at scale, 2021, arXiv: 2010.11929. https://doi.org/10.48550/arXiv.2010.11929
  • Reader Comments
  • © 2025 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(766) PDF downloads(50) Cited by(0)

Article outline

Figures and Tables

Figures(8)  /  Tables(11)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog