Most multimodal brain tumor segmentation methods assume the availability of all modalities. However, models trained on complete modality data often experience a significant performance drop when certain modalities are missing, posing a major challenge for real-world applications. In this study, we address this issue by maximizing the use of information from the remaining modalities to reduce inter-modal dependency, allowing the encoder to extract robust features from the available data for accurate tumor segmentation. To this end, we propose a novel framework, the discriminative prompt optimization network (DPONet), that incorporates frequency filtering prompts and spatial perturbation prompts to enhance image representation space during feature extraction and fusion. To handle various missing modality scenarios, we also introduce a probability-based missing data simulation method. We evaluate DPONet on two public brain tumor segmentation datasets, BraTS2018 and BraTS2020. Experimental results demonstrate that DPONet outperforms state-of-the-art methods in terms of Dice score, HD95, and sensitivity, proving its effectiveness under both complete and incomplete modality conditions.
Citation: Yaru Cheng, Yuanjie Zheng. Frequency filtering prompt tuning for medical image semantic segmentation with missing modalities[J]. Big Data and Information Analytics, 2024, 8: 109-128. doi: 10.3934/bdia.2024006
Most multimodal brain tumor segmentation methods assume the availability of all modalities. However, models trained on complete modality data often experience a significant performance drop when certain modalities are missing, posing a major challenge for real-world applications. In this study, we address this issue by maximizing the use of information from the remaining modalities to reduce inter-modal dependency, allowing the encoder to extract robust features from the available data for accurate tumor segmentation. To this end, we propose a novel framework, the discriminative prompt optimization network (DPONet), that incorporates frequency filtering prompts and spatial perturbation prompts to enhance image representation space during feature extraction and fusion. To handle various missing modality scenarios, we also introduce a probability-based missing data simulation method. We evaluate DPONet on two public brain tumor segmentation datasets, BraTS2018 and BraTS2020. Experimental results demonstrate that DPONet outperforms state-of-the-art methods in terms of Dice score, HD95, and sensitivity, proving its effectiveness under both complete and incomplete modality conditions.
| [1] |
Soomro T, Zheng L, Afifi A, Ali A, Soomro S, Yin M, et al. (2022) Image segmentation for MR brain tumor detection using machine learning: A review. IEEE Rev Biomed Eng 16: 70–90. https://doi.org/10.1109/RBME.2022.3185292 doi: 10.1109/RBME.2022.3185292
|
| [2] | Li S, Du C, Zhao Y, Huang Y, Zhao H, (2023) What makes for robust multi-modal models in the face of missing modalities?. preprint, arXiv: 2310.06383. https://doi.org/10.48550/arXiv.2310.06383 |
| [3] |
Choi Y, Al-Masni M, Jung K, Yoo RE, Lee SY, Kim DH, (2023) A single stage knowledge distillation network for brain tumor segmentation on limited MR image modalities. Comput Methods Programs Biomed 240: 107644. https://doi.org/10.1016/j.cmpb.2023.107644 doi: 10.1016/j.cmpb.2023.107644
|
| [4] | Wang S, Yan Z, Zhang D, Wei H, Li Z, Li R, (2023) Prototype knowledge distillation for medical segmentation with missing modality. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 1–5. https://doi.org/10.1109/ICASSP49357.2023.10095014 |
| [5] |
Chen Q, Zhang J, Meng R, Zhou L, Li Z, Feng Q, et al. (2024) Modality-specific information disentanglement from multi-parametric MRI for breast tumor segmentation and computer-aided diagnosis. IEEE Trans Med Imaging 43: 1958–1971. https://doi.org/10.1109/TMI.2024.3352648 doi: 10.1109/TMI.2024.3352648
|
| [6] | Chen C, Dou Q, Jin Y, Chen H, Qin J, Heng PA, (2019) Robust multimodal brain tumor segmentation via feature disentanglement and gated fusion. In: Medical Image Computing and Computer Assisted Intervention–MICCAI 2019 11766: 447–456. https://doi.org/10.1007/978-3-030-32248-9_50 |
| [7] |
Zhang C, Chu X, Ma L, Zhu Y, Wang Y, Wang J, et al. (2022) M3care: Learning with missing modalities in multimodal healthcare data. Assoc Comput Mach 2022: 2418–2428. https://doi.org/10.1145/3534678.353938 doi: 10.1145/3534678.353938
|
| [8] | Wang H, Chen Y, Ma C, Avery J, Hull L, Carneiro G, (2023) Multi-modal learning with missing modality via shared-specific feature modelling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 15878–15887. |
| [9] | Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. (2017) Attention is all you need. Advances in neural information processing systems. In: Proceedings of the 31st International Conference on Neural Information Processing Systems 6000–6010. |
| [10] | Brown T, Mann B, Ryder N, Subbiah M, Kaplan JD, Dhariwal P, et al. (2020) Language models are few-shot learners. Adv Neural Inf Process Syst 33: 1877–1901. |
| [11] |
Zhou K, Yang J, Loy CC, Liu Z, (2022) Learning to prompt for vision-language models. Int J Comput Vision 130: 2337–2348. https://doi.org/10.1007/s11263-022-01653-1 doi: 10.1007/s11263-022-01653-1
|
| [12] | Zhou K, Yang J, Loy CC, Liu Z, (2022) Conditional prompt learning for vision-language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 16816–16825. https://doi.org/10.1109/CVPR52688.2022.01631 |
| [13] | Yu J, Wang Z, Vasudevan V, Yeung L, Seyedhosseini M, Wu Y, (2022) Coca: Contrastive captioners are image-text foundation models. preprint, arXiv: 2205.01917. https://doi.org/10.48550/arXiv.2205.01917 |
| [14] |
Wang Q, Mao Y, Wang J, Yu H, Nie S, Wang S, et al. (2023) Aprompt: Attention prompt tuning for efficient adaptation of pre-trained language models. Assoc Comput Linguist 2023: 9147–9160. https://doi.org/10.18653/v1/2023.emnlp-main.567 doi: 10.18653/v1/2023.emnlp-main.567
|
| [15] | Dehghani M, Djolonga J, Mustafa B, Padlewski P, Heek J, Gilmer J, et al. (2023) Scaling vision transformers to 22 billion parameters. In: Proceedings of Machine Learning Research 202: 7480–7512. |
| [16] |
Zhou T, Canu S, Vera P, Ruan S, (2021) Latent correlation representation learning for brain tumor segmentation with missing MRI modalities. IEEE Trans Image Process 30: 4263–4274. https://doi.org/10.1109/TIP.2021.3070752 doi: 10.1109/TIP.2021.3070752
|
| [17] |
Ting H, Liu M, (2023) Multimodal transformer of incomplete MRI data for brain tumor segmentation. IEEE J Biomed Health Inf 28: 89–99. https://doi.org/10.1109/JBHI.2023.3286689 doi: 10.1109/JBHI.2023.3286689
|
| [18] | Liu H, Wei D, Lu D, Sun J, Wang L, Zheng Y, (2023) M3AE: Multimodal representation learning for brain tumor segmentation with missing modalities. In: Proceedings of the AAAI Conference on Artificial Intelligence 37: 1657–1665. https://doi.org/10.1609/aaai.v37i2.25253 |
| [19] | Tran A, Mathews A, Xie L, Ong CS, (2021) Factorized fourier neural operators. preprint, arXiv: 2111.13802. https://doi.org/10.48550/arXiv.2111.13802 |
| [20] |
Chen Y, Ren Q, Yan J, (2022) Rethinking and improving robustness of convolutional neural networks: A shapley value-based approach in frequency domain. Adv Neural Inf Process Syst 35: 324–337. https://dl.acm.org/doi/10.5555/3600270.3600294 doi: 10.5555/3600270.3600294
|
| [21] | Krishnamachari K, Ng S, Foo C, (2023) Fourier sensitivity and regularization of computer vision models. preprint, arXiv: 2301.13514. https://doi.org/10.48550/arXiv.2301.13514 |
| [22] | Fang C, Zhang D, Wang L, Zhang Y, Cheng L, Han J, (2022) Cross-modality high-frequency transformer for mr image super-resolution. In: Proceedings of the 30th ACM International Conference on Multimedia 1584–1592. https://doi.org/10.1145/3503161.3547804 |
| [23] | Xu Z, Gong H, Wan X, Li H, (2023) Asc: Appearance and structure consistency for unsupervised domain adaptation in fetal brain mri segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention 14226: 325–335. https://doi.org/10.1007/978-3-031-43990-2_31. |
| [24] | Houlsby N, Giurgiu A, Jastrzebski S, Morrone B, De Laroussilhe Q, Gesmundo A, et al. (2019) Parameter-efficient transfer learning for NLP. In: International Conference on Machine Learning 2790–2799. |
| [25] | Li X, Liang P, (2021) Prefix-tuning: Optimizing continuous prompts for generation. Assoc Comput Linguist 4582–4597. https://doi.org/10.18653/v1/2021.acl-long.353 |
| [26] | Jia M, Tang L, Chen BC, Cardie C, Belongie S, Hariharan B, et al. (2022) Visual prompt tuning. In: Proceedings of the 17th European Conference on Computer Vision, Springer 13693: 709–727. https://doi.org/10.1007/978-3-031-19827-4_41 |
| [27] | Bahng H, Jahanian A, Sankaranarayanan S, Isola P, et al. (2022) Exploring visual prompts for adapting large-scale models. preprint, arXiv: 2203.17274. https://doi.org/10.48550/arXiv.2203.17274 |
| [28] | Wang Z, Zhang Z, Lee CY, Zhang H, Sun R, Ren X, et al. (2022) Learning to prompt for continual learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 139–149. |
| [29] | Lee Y, Tsai Y, Chiu W, et al. (2023) Multimodal prompting with missing modalities for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 14943–14952. |
| [30] |
Qiu Y, Zhao Z, Yao H, Chen D, Wang Z, (2023) Modal-aware visual prompting for incomplete multi-modal brain tumor segmentation. Assoc Comput Mach 2023: 3228–3239. https://doi.org/10.1145/3581783.361171 doi: 10.1145/3581783.361171
|
| [31] | Wang W, Xie E, Li X, Fan DP, Song K, Liang D, et al. (2021) Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In: Proceedings of the IEEE/CVF International Conference on Computer Vision 568–578. |
| [32] | Chen S, Ge C, Tong Z, Wang J, Song Y, Wang J, et al. (2022) Adaptformer: Adapting vision transformers for scalable visual recognition. Adv Neural Inf Process Syst 35: 16664–16678. |
| [33] |
Menze B, Jakab A, Bauer S, Kalpathy-Cramer J, Farahani K, Kirby J, et al. (2014) The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans Med Imaging 34: 1993–2024. https://doi.org/10.1109/TMI.2014.2377694 doi: 10.1109/TMI.2014.2377694
|
| [34] |
Bakas S, Akbari H, Sotiras A, Bilello M, Rozycki M, Kirby JS, et al. (2017) Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Sci Data 4: 1–13. https://doi.org/10.1038/sdata.2017.117 doi: 10.1038/sdata.2017.117
|
| [35] | Bakas S, Reyes M, Jakab A, Bauer S, Rempfler M, Crimi A, et al. (2018) Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. preprint, arXiv: 1811.02629. https://doi.org/10.48550/arXiv.1811.02629 |
| [36] | Ding Y, Yu X, Yang Y, (2021) RFNet: Region-aware fusion network for incomplete multi-modal brain tumor segmentation. In: IEEE/CVF International Conference on Computer Vision (ICCV) 3975–3984. https://doi.org/10.1109/ICCV48922.2021.00394 |
| [37] |
Zhang Y, He N, Yang J, Li Y, Wei D, Huang Y, et al. (2022) Mmformer: Multimodal medical transformer for incomplete multimodal learning of brain tumor segmentation. International Conference on Medical Image Computing and Computer-Assisted Intervention 13435: 107–117. https://doi.org/10.1007/978-3-031-16443-9_1 doi: 10.1007/978-3-031-16443-9_1
|
| [38] |
Yang Q, Guo X, Chen Z, Woo PYM, Yuan Y, (2022) D2-Net: Dual disentanglement network for brain tumor segmentation with missing modalities. IEEE Trans Med Imaging 41: 2953–2964. https://doi.org/10.1109/TMI.2022.3175478 doi: 10.1109/TMI.2022.3175478
|