Alzheimer's disease (AD) is a progressive neurodegenerative disorder that imposes a substantial burden on families and healthcare systems. Mild cognitive impairment (MCI), as an intermediate stage between normal aging and AD, can be further divided into progressive MCI (pMCI) and stable MCI (sMCI) based on follow-up outcomes. Unlike the marked differences observed between cognitively normal (CN) individuals and AD patients, sMCI and pMCI share highly similar characteristics, making early identification of pMCI extremely challenging. Although deep learning methods based on structural magnetic resonance imaging (sMRI) have advanced AD classification, research on predicting MCI progression remains limited due to the high similarity between sMCI and pMCI as well as the substantial cost of prospectively collecting longitudinal data. Accurate early identification of pMCI is essential for timely intervention, slowing disease progression, and reducing healthcare costs. Therefore, this study focused on the early identification of progressive MCI. To address this, we proposed a novel vision Transformer framework, the multi-scale multi-domain frequency-aware vision Transformer (MMF-ViT), which employs a multi-scale cross-domain fusion (MSCDF) module to enable deep interaction between spatial and frequency domain features, thereby enhancing the modeling of fine-grained brain structural variations. The multi-scale frequency encoder (MSFE) and multi-scale context encoder (MSCE) were designed to extract and fuse frequency and spatial information, effectively improving classification performance. Experimental results on the ADNI dataset demonstrate that MMF-ViT achieves an accuracy of 72.84% and an AUC of 72.99% for sMCI versus pMCI classification, significantly outperforming mainstream 2D and 3D models. In AD vs. CN classification, MMF-ViT also achieves an accuracy of 85.59%, highlighting its strong feature representation capability and practical potential.
Citation: Ying Liu, XiaoLi Yang. MMF-ViT: A multi-scale multi-domain frequency-aware vision Transformer for MRI-based Alzheimer's classification[J]. Electronic Research Archive, 2025, 33(10): 5916-5936. doi: 10.3934/era.2025263
Alzheimer's disease (AD) is a progressive neurodegenerative disorder that imposes a substantial burden on families and healthcare systems. Mild cognitive impairment (MCI), as an intermediate stage between normal aging and AD, can be further divided into progressive MCI (pMCI) and stable MCI (sMCI) based on follow-up outcomes. Unlike the marked differences observed between cognitively normal (CN) individuals and AD patients, sMCI and pMCI share highly similar characteristics, making early identification of pMCI extremely challenging. Although deep learning methods based on structural magnetic resonance imaging (sMRI) have advanced AD classification, research on predicting MCI progression remains limited due to the high similarity between sMCI and pMCI as well as the substantial cost of prospectively collecting longitudinal data. Accurate early identification of pMCI is essential for timely intervention, slowing disease progression, and reducing healthcare costs. Therefore, this study focused on the early identification of progressive MCI. To address this, we proposed a novel vision Transformer framework, the multi-scale multi-domain frequency-aware vision Transformer (MMF-ViT), which employs a multi-scale cross-domain fusion (MSCDF) module to enable deep interaction between spatial and frequency domain features, thereby enhancing the modeling of fine-grained brain structural variations. The multi-scale frequency encoder (MSFE) and multi-scale context encoder (MSCE) were designed to extract and fuse frequency and spatial information, effectively improving classification performance. Experimental results on the ADNI dataset demonstrate that MMF-ViT achieves an accuracy of 72.84% and an AUC of 72.99% for sMCI versus pMCI classification, significantly outperforming mainstream 2D and 3D models. In AD vs. CN classification, MMF-ViT also achieves an accuracy of 85.59%, highlighting its strong feature representation capability and practical potential.
| [1] |
M. A. Better, 2023 Alzheimer's disease facts and figures, Alzheimer's Dementia, 19 (2023), 1598–1695. https://doi.org/10.1002/alz.13016 doi: 10.1002/alz.13016
|
| [2] |
C. Wang, Y. Lei, T. Chen, J. Zhang, Y. Li, H. Shan, HOPE: Hybrid-granularity ordinal prototype learning for progression prediction of mild cognitive impairment, IEEE J. Biomed. Health Inf., 28 (2024), 6429–6440. https://doi.org/10.1109/JBHI.2024.3357453 doi: 10.1109/JBHI.2024.3357453
|
| [3] |
Y. Huang, J. Xu, Y. Zhou, T. Tong, X. Zhuang, Alzheimer's Disease Neuroimaging Initiative (ADNI), Diagnosis of Alzheimer's disease via multi-modality 3D convolutional neural network, Front. Neurosci., 13 (2019), 509. https://doi.org/10.3389/fnins.2019.00509 doi: 10.3389/fnins.2019.00509
|
| [4] |
S. Qiu, P. S. Joshi, M. I. Miller, C. Xue, X. Zhou, C. Karjadi, et al., Development and validation of an interpretable deep learning framework for Alzheimer's disease classification, Brain, 143 (2020), 1920–1933. https://doi.org/10.1093/brain/awaa137 doi: 10.1093/brain/awaa137
|
| [5] |
X. Zhang, L. Han, W. Zhu, L. Sun, D. Zhang, An explainable 3D residual self-attention deep neural network for joint atrophy localization and Alzheimer's disease diagnosis using structural MRI, IEEE J. Biomed. Health Inf., 26 (2022), 5289–5297. https://doi.org/10.1109/JBHI.2021.3066832 doi: 10.1109/JBHI.2021.3066832
|
| [6] |
U. Khatri, G. R. Kwon, Diagnosis of Alzheimer's disease via optimized lightweight convolution-attention and structural MRI, Comput. Biol. Med., 171 (2024), 108116. https://doi.org/10.1016/j.compbiomed.2024.108116 doi: 10.1016/j.compbiomed.2024.108116
|
| [7] |
G. B. Frisoni, N. C. Fox, C. R. J. Jr, P. Scheltens, P. M. Thompson, The clinical use of structural MRI in Alzheimer disease, Nat. Rev. Neurol., 6 (2010), 67–77. https://doi.org/10.1038/nrneurol.2009.215 doi: 10.1038/nrneurol.2009.215
|
| [8] |
W. Feng, N. V. Halm-Lutterodt, H. Tang, A. Mecum, M. K. Mesregah, Y. Ma, et al., Automated MRI-based deep learning model for detection of Alzheimer's disease process, Int. J. Neural Syst., 30 (2020), 2050032. https://doi.org/10.1142/S012906572050032X doi: 10.1142/S012906572050032X
|
| [9] |
B. Y. Lim, K. W. Lai, K. Haiskin, K. A. S. H. Kulathilake, Z. C. Ong, Y. C. Hum, et al., Deep learning model for prediction of progressive mild cognitive impairment to Alzheimer's disease using structural MRI, Front. Aging Neurosci., 14 (2022), 876202. https://doi.org/10.3389/fnagi.2022.876202 doi: 10.3389/fnagi.2022.876202
|
| [10] | J. Wu, X. Zhang, Y. Li, Y. Zhang, J. Liu, C. Zheng, et al., A multi-scale feature and dual self-attention mechanism for enhanced Alzheimer's disease classification, in 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, (2024), 4300–4306. https://doi.org/10.1109/BIBM62325.2024.10822167 |
| [11] | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, et al., Attention is all you need, in Advances in Neural Information Processing Systems (NeurIPS), (2017), 5998–6008. |
| [12] | J. Jang, D. Hwang, M3T: Three-dimensional medical image classifier using multi-plane and multi-slice transformer, in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2022), 20686–20697. https://doi.org/10.1109/CVPR52688.2022.02006 |
| [13] |
Y. Huang, W. Li, Resizer swin transformer-based classification using sMRI for Alzheimer's disease, Appl. Sci., 13 (2023), 9310. https://doi.org/10.3390/app13169310 doi: 10.3390/app13169310
|
| [14] |
F. Liu, H. Wang, S. N. Liang, Z. Jin, S. Wei, X. Li, et al., MPS-FFA: A multiplane and multiscale feature fusion attention network for Alzheimer's disease prediction with structural MRI, Comput. Biol. Med., 157 (2023), 106790. https://doi.org/10.1016/j.compbiomed.2023.106790 doi: 10.1016/j.compbiomed.2023.106790
|
| [15] | X. Zhang, J. Wu, Y. Zhang, Y. Li, J. X. Liu, Q. Liang, et al., MSFAN: A multi-scale feature attention network for Alzheimer's disease diagnosis, in 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), IEEE, (2024), 5335–5342. https://doi.org/10.1109/BIBM62325.2024.10821998 |
| [16] |
X. Fan, H. Li, L. Liu, K. Zhang, Z. Zhang, Y. Chen, et al., Early diagnosing and transformation prediction of Alzheimer's disease using multi-scaled self-attention network on structural MRI images with occlusion sensitivity analysis, J. Alzheimer's Dis., 97 (2023), 909–926. https://doi.org/10.3233/JAD-230705 doi: 10.3233/JAD-230705
|
| [17] |
R. F. Tian, J. N. Li, S. W. Zhang, MSCLK: Multi-scale fully separable convolution neural network with large kernels for early diagnosis of Alzheimer's disease, Expert Syst. Appl., 252 (2024), 124241. https://doi.org/10.1016/j.eswa.2024.124241 doi: 10.1016/j.eswa.2024.124241
|
| [18] |
F. Yan, L. Peng, F. Dong, K. Hirota, MCNEL: A multi-scale convolutional network and ensemble learning for Alzheimer's disease diagnosis, Comput. Methods Programs Biomed., 264 (2025), 108703. https://doi.org/10.1016/j.cmpb.2025.108703 doi: 10.1016/j.cmpb.2025.108703
|
| [19] | H. Ding, J. Lu, J. Cai, Y. Zhang, Y. Shang, SLF-UNet: Improved UNet for brain MRI segmentation by combining spatial and low-frequency domain features, in Advances in Computer Graphics, Springer, (2023), 415–426. https://doi.org/10.1007/978-3-031-50075-6_32 |
| [20] |
J. Feng, M. Jiang, H. Zhang, L. Yin, ADNI, Integrating imaging and genetic data via wavelet transform-based CNN for Alzheimer's disease classification, Biomed. Signal Process. Control, 104 (2025), 107583. https://doi.org/10.1016/j.bspc.2025.107583 doi: 10.1016/j.bspc.2025.107583
|
| [21] | R. Kushol, A. Masoumzadeh, D. Huo, S. Kalra, Y. H. Yang, Addformer: Alzheimer's disease detection from structural MRI using fusion transformer, in 2022 IEEE 19th International Symposium on Biomedical Imaging (ISBI), IEEE, (2022), 1–5. https://doi.org/10.1109/ISBI52829.2022.9761421 |
| [22] |
R. C. Petersen, P. S. Aisen, L. A. Beckett, M. C. Donohue, A. C. Gamst, D. J. Harvey, et al., Alzheimer's Disease Neuroimaging Initiative (ADNI): Clinical characterization, Neurology, 74 (2010), 201–209. https://doi.org/10.1212/WNL.0b013e3181cb3e25 doi: 10.1212/WNL.0b013e3181cb3e25
|
| [23] |
C. Wang, S. Piao, Z. Huang, Q. Gao, J. Zhang, Y. Li, et al., Joint learning framework of cross-modal synthesis and diagnosis for Alzheimer's disease by mining underlying shared modality information, Med. Image Anal., 91 (2024), 103032. https://doi.org/10.1016/j.media.2023.103032 doi: 10.1016/j.media.2023.103032
|
| [24] | B. Fischl, FreeSurfer, NeuroImage, 62 (2012), 774–781. https://doi.org/10.1016/j.neuroimage.2012.01.021 |
| [25] |
Y. Zhang, Z. Dong, P. Phillips, S. Wang, G. Ji, J. Yang, et al., Detection of subjects and brain regions related to Alzheimer's disease using 3D MRI scans based on eigenbrain and machine learning, Front. Comput. Neurosci., 9 (2015), 66. https://doi.org/10.3389/fncom.2015.00066 doi: 10.3389/fncom.2015.00066
|
| [26] | K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90 |
| [27] | N. Ma, X. Zhang, H. T. Zheng, J. Sun, ShuffleNet V2: Practical guidelines for efficient CNN architecture design, in Proceedings of the European Conference on Computer Vision (ECCV), Springer, (2018), 122–138. https://doi.org/10.1007/978-3-030-01264-9_8 |
| [28] | M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, L. C. Chen, MobileNetV2: Inverted residuals and linear bottlenecks, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2018), 4510–4520. https://doi.org/10.1109/CVPR.2018.00474 |
| [29] | M. Tan, Q. V. Le, EfficientNet: Rethinking model scaling for convolutional neural networks, in Proceedings of the 36th International Conference on Machine Learning (ICML), PMLR, (2019), 6105–6114. |
| [30] | X. Ding, X. Zhang, N. Ma, J. Han, G. Ding, J. Sun, RepVGG: Making VGG-style ConvNets great again, in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2021), 13728–13737. https://doi.org/10.1109/CVPR46437.2021.01352 |
| [31] | X. Liu, H. Peng, N. Zheng, Y. Yang, H. Hu, Y. Yuan, EfficientViT: Memory efficient vision transformer with cascaded group attention, in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, (2023), 14420–14430. https://doi.org/10.1109/CVPR52729.2023.01386 |
| [32] | S. Mehta, M. Rastegari, MobileViT: Light-weight, general-purpose, and mobile-friendly vision transformer, in International Conference on Learning Representations (ICLR), (2022), 1–26. |
| [33] | S. Mehta, M. Rastegari, Separable self-attention for mobile vision transformers, preprint, arXiv: 2206.02680. |
| [34] | Z. Dai, H. Liu, Q. V. Le, M. Tan, CoAtNet: Marrying convolution and attention for all data sizes, in Advances in Neural Information Processing Systems (NeurIPS), (2021), 3965–3977. |
| [35] | Z. Tu, H. Talebi, H. Zhang, F. Yang, P. Milanfar, A. Bovik, et al., MaxViT: Multi-axis vision transformer, in European conference on computer vision (ECCV), Springer, (2022), 459–479. https://doi.org/10.48550/arXiv.2204.01697 |
| [36] |
S. Belleville, C. Fouquet, C. Hudon, H. T. V. Zomahoun, J. Croteau, Consortium for the Early Identification of Alzheimer's disease-Quebec, Neuropsychological measures that predict progression from mild cognitive impairment to Alzheimer's type dementia in older adults: A systematic review and meta-analysis, Neuropsychol. Rev., 27 (2017), 328–353. https://doi.org/10.1007/s11065-017-9361-5 doi: 10.1007/s11065-017-9361-5
|
| [37] |
Y. W. Yu, C. H. Tan, H. C. Su, C. Y. Chien, P. S. Sung, T. Y. Lin, et al., A new instrument combines cognitive and social functioning items for detecting mild cognitive impairment and dementia in Parkinson's disease, Front. Aging Neurosci., 14 (2022), 913958. https://doi.org/10.3389/fnagi.2022.913958 doi: 10.3389/fnagi.2022.913958
|