Attention-enhanced hybrid deep learning framework for Monkeypox skin lesion classification

Zhonghua Zhang; Siying Zheng; Zhonghua Zhang; Siying Zheng

doi:10.3934/era.2026034

Electronic Research Archive

2026, Volume 34, Issue 2: 738-776. doi: 10.3934/era.2026034

Previous Article Next Article

Research article

Attention-enhanced hybrid deep learning framework for Monkeypox skin lesion classification

Zhonghua Zhang ^,,
Siying Zheng

School of Science, Xi'an University of Science and Technology, Xi'an 710600, China

Received: 16 November 2025 Revised: 06 January 2026 Accepted: 14 January 2026 Published: 22 January 2026

Monkeypox, a re-emerging zoonotic disease, has become a growing global public health concern due to its rapid transmission and high visual similarity to other dermatological conditions such as chickenpox and measles. This resemblance complicates early clinical diagnoses, particularly in resource-limited settings where laboratory testing capabilities are scarce. Deep learning methods capable of detecting monkeypox from skin images offer a promising alternative to manual inspection and PCR-based diagnoses. However, existing approaches are often limited by poor dataset quality, weak generalization ability, and insufficient model interpretability. To address these challenges, this paper proposes an attention-enhanced hybrid deep learning framework for the automated classification of monkeypox skin lesions. Specifically, the proposed model employs DenseNet-121 and EfficientNet-B4 as parallel convolutional feature extractors and integrates Convolutional Block Attention Modules (CBAM) to adaptively emphasize lesion-related features while suppressing background interference. Experimental results demonstrate that the proposed framework outperforms conventional deep learning baselines, thereby achieving an accuracy exceeding 91% and a Cohen's Kappa score above 0.88 on the primary dataset. Furthermore, the model exhibits a strong generalization capability, thereby maintaining classification accuracies above 90% across multiple external public datasets. To enhance the transparency and clinical reliability, explainable artificial intelligence (XAI) methods are employed to visualize the model's decision-making process. In addition, quantitative interpretability metrics are used to assess the reliability and consistency of the generated explanations, thus highlighting the model's potential for practical clinical applications.
- Monkeypox,
- image classification,
- deep learning,
- CBAM
Citation: Zhonghua Zhang, Siying Zheng. Attention-enhanced hybrid deep learning framework for Monkeypox skin lesion classification[J]. Electronic Research Archive, 2026, 34(2): 738-776. doi: 10.3934/era.2026034

Related Papers:

Abstract

Monkeypox, a re-emerging zoonotic disease, has become a growing global public health concern due to its rapid transmission and high visual similarity to other dermatological conditions such as chickenpox and measles. This resemblance complicates early clinical diagnoses, particularly in resource-limited settings where laboratory testing capabilities are scarce. Deep learning methods capable of detecting monkeypox from skin images offer a promising alternative to manual inspection and PCR-based diagnoses. However, existing approaches are often limited by poor dataset quality, weak generalization ability, and insufficient model interpretability. To address these challenges, this paper proposes an attention-enhanced hybrid deep learning framework for the automated classification of monkeypox skin lesions. Specifically, the proposed model employs DenseNet-121 and EfficientNet-B4 as parallel convolutional feature extractors and integrates Convolutional Block Attention Modules (CBAM) to adaptively emphasize lesion-related features while suppressing background interference. Experimental results demonstrate that the proposed framework outperforms conventional deep learning baselines, thereby achieving an accuracy exceeding 91% and a Cohen's Kappa score above 0.88 on the primary dataset. Furthermore, the model exhibits a strong generalization capability, thereby maintaining classification accuracies above 90% across multiple external public datasets. To enhance the transparency and clinical reliability, explainable artificial intelligence (XAI) methods are employed to visualize the model's decision-making process. In addition, quantitative interpretability metrics are used to assess the reliability and consistency of the generated explanations, thus highlighting the model's potential for practical clinical applications.

References

[1]	S. Parker, R. M. Buller, A review of experimental and natural infections of animals with monkeypox virus between 1958 and 2012, Future Virol., 8 (2013), 129–157. https://doi.org/10.2217/fvl.12.130 doi: 10.2217/fvl.12.130
[2]	J. G. Breman, Kalisa-Ruti, M. V. Steniowski, E. Zanotto, A. I. Gromyko, I. Arita, Human monkeypox, 1970-79, Bull. World Health Organ., 58 (1980), 165–182.
[3]	World Health Organization, Diagnostic testing for the monkeypox virus (MPXV): interim guidance, Report of Hong Kong SARS Expert Committee, 2024. Available from: https://www.who.int/zh/publications/i/item/WHO-MPX-Laboratory-2024.1.
[4]	E. Petersen, A. Kantele, M. Koopmans, D. Asogun, A. Yinka-Ogunleye, C. Ihekweazu, et al., Human monkeypox: epidemiologic and clinical characteristics, diagnosis, and prevention, Infect. Dis. Clin. North Am., 33 (2019), 1027–1043. https://doi.org/10.1016/j.idc.2019.03.001 doi: 10.1016/j.idc.2019.03.001
[5]	J. Egger, C. Gsaxner, A. Pepe, K. L. Pomykala, F. Jonske, M. Kurz, et al., Medical deep learning—a systematic meta-review, Comput. Methods Programs Biomed., 221 (2022), 106874. https://doi.org/10.1016/j.cmpb.2022.106874 doi: 10.1016/j.cmpb.2022.106874
[6]	N. Tsiknakis, D. Theodoropoulos, G. Manikis, E. Ktistakis, O. Boutsora, A. Berto, et al., Deep learning for diabetic retinopathy detection and classification based on fundus images: a review, Comput. Biol. Med., 135 (2021), 104599. https://doi.org/10.1016/j.compbiomed.2021.104599 doi: 10.1016/j.compbiomed.2021.104599
[7]	R. Javed, T. Abbas, A. H. Khan, A. Daud, A. Bukhari, R. Alharbey, et al., Deep learning for lungs cancer detection: a review, Artif. Intell. Rev., 57 (2024), 197. https://doi.org/10.1007/s10462-024-10807-1 doi: 10.1007/s10462-024-10807-1
[8]	O. S. Kareem, A. M. Abdulazee, D. Q. Zeebaree, Skin lesions classification using deep learning techniques: review, Asian J. Res. Comput. Sci., 9 (2021), 1–22. https://doi.org/10.9734/ajrcos/2021/v9i130210 doi: 10.9734/ajrcos/2021/v9i130210
[9]	M. M. Ahsan, M. R. Uddin, M. S. Ali, M. K. Islam, M. Farjana, A. N. Sakib, et al., Deep transfer learning approaches for Monkeypox disease diagnosis, Expert Syst. Appl., 216 (2023), 119483. https://doi.org/10.1016/j.eswa.2022.119483 doi: 10.1016/j.eswa.2022.119483
[10]	M. F. Almufareh, S. Tehsin, M. Humayun, S. Kausar, A transfer learning approach for clinical detection support of monkeypox skin lesions, Diagnostics, 13 (2023), 1503. https://doi.org/10.3390/diagnostics13081503 doi: 10.3390/diagnostics13081503
[11]	M. Altun, H. Gürüler, O. Özkaraca, F. Khan, J. Khan, Y. Lee, Monkeypox detection using CNN with transfer learning, Sensors, 23 (2023), 1783. https://doi.org/10.3390/s23041783 doi: 10.3390/s23041783
[12]	N. Dahiya, Y. K. Sharma, U. Rani, S. Hussain, K.V. Nabilal, A. Mohan, et al., Hyper-parameter tuned deep learning approach for effective human monkeypox disease detection, Sci. Rep., 13 (2023), 15930. https://doi.org/10.1038/s41598-023-43236-1 doi: 10.1038/s41598-023-43236-1
[13]	C. Sitaula, T. B. Shahi, Monkeypox virus detection using pre-trained deep learning-based approaches, J. Med. Syst., 46 (2022), 78. https://doi.org/10.1007/s10916-022-01868-2 doi: 10.1007/s10916-022-01868-2
[14]	F. Uysal, Detection of monkeypox disease from human skin images with a hybrid deep learning model, Diagnostics, 13 (2023), 1772. https://doi.org/10.3390/diagnostics13101772 doi: 10.3390/diagnostics13101772
[15]	A.D. Raha, M. Gain, R. Debnath, A. Adhikary, Y. Qiao, M. M. Hassan, et al., Attention to monkeypox: an interpretable monkeypox detection technique using attention mechanism, IEEE Access, 12 (2024), 51942–51965. https://doi.org/10.1109/ACCESS.2024.3385099 doi: 10.1109/ACCESS.2024.3385099
[16]	J. Sun, B. Yuan, Z. Sun, J. Zhu, Y. Deng, Y. Gong, et al., MpoxNet: dual-branchdeep residual squeeze and excitation monkeypox classification network with attention mechanism, Front. Cell. Infect. Microbiol., 14 (2024), 1397316. https://doi.org/10.3389/fcimb.2024.1397316 doi: 10.3389/fcimb.2024.1397316
[17]	S. H. Khan, R. Iqbal, RS-FME-SwinT: a novel feature map enhancement framework integrating customized SwinT with residual and spatial CNN for monkeypox diagnosis, preprint, arXiv: 2410.01216. https://doi.org/10.48550/arXiv.2410.01216
[18]	J. Deng, J. Liu, C. Kong, B. Zang, Y. Hu, M. Zou, Using novel deep learning models for rapid and efficient assistance in monkeypox screening from skin images, Front. Med., 11 (2024), 1443812. https://doi.org/10.3389/fmed.2024.1443812 doi: 10.3389/fmed.2024.1443812
[19]	A. Shateri, N. Nourani, M. Dorrigiv, H. Nasiri, An explainable nature-inspired framework for monkeypox diagnosis: Xception features combined with NGBoost andAfrican vultures optimization algorithm, preprint, arXiv: 2504.17540. https://doi.org/10.48550/arXiv.2504.17540
[20]	M. S. Hossain, M. Ahmed, M. S. Rahman, From survey to solution: a deep learning framework for reliable monkeypox diagnosis using skin images, Array, 28 (2025), 100554. https://doi.org/10.1016/j.array.2025.100554 doi: 10.1016/j.array.2025.100554
[21]	W. Chen, K. Yang, Z. Yu, Y. Shi, C. P. Chen, A survey on imbalanced learning: latest research, applications and future directions, Artif. Intell. Rev., 57 (2024), 137. https://doi.org/10.1007/s10462-024-10759-6 doi: 10.1007/s10462-024-10759-6
[22]	D. Bala, M. S. Hossain, M. A. Hossain, M. I. Abdullah, M. M. Rahman, B. Manavalan, et al., MonkeyNet: A robust deep convolutional neural network for monkeypox disease detection and classification, Neural Networks, 161 (2023), 757–775. https://doi.org/10.1016/j.neunet.2023.02.022 doi: 10.1016/j.neunet.2023.02.022
[23]	G. E. Batista, A. L. Bazzan, M. C. Monard, Balancing training data for automated annotation of keywords: a case study, in WOB, (2003), 10–18.
[24]	S. Rao, P. Poojary, J. Somaiya, P. Mahajan, A comparative study between various preprocessing techniques for machine learning, Int. J. Eng. Appl. Sci. Technol., 5 (2020), 2455–2143.
[25]	W. Rawat, Z. Wang, Deep convolutional neural networks for image classification: a comprehensive review, Neural Comput., 29 (2017), 2352–2449. https://doi.org/10.1162/neco_a_00990 doi: 10.1162/neco_a_00990
[26]	A. Howard, M. Sandler, G. Chu, L. C. Chen, B. Chen, M. Tan, et al., Searching for mobilenetv3, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019), 1314–1324.
[27]	K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 770–778.
[28]	C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking the inception architecture for computer vision, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 2818–2826.
[29]	F. Chollet, Xception: deep learning with depthwise separable convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 1251–1258.
[30]	M. Tan, Q. Le, Efficientnet: rethinking model scaling for convolutional neural networks, in International Conference on Machine Learning, 97 (2019), 6105–6114. https://doi.org/10.48550/arXiv.1905.11946
[31]	G. Huang, Z. Liu, L. V. Der Maaten, K. Q. Weinberger, Densely connected convolutional networks, in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, (2017), 2261–2269. https://doi.org/10.1109/CVPR.2017.243
[32]	K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation, preprint, arXiv: 1406.1078. https://doi.org/10.48550/arXiv.1406.1078
[33]	S. Woo, J. Park, J. Y. Lee, I. S. Kweon, CBAM: convolutional block attention module, in Proceedings of the European Conference on Computer Vision (ECCV), 11211 (2018), 3–19. https://doi.org/10.1007/978-3-030-01234-2_1
[34]	Monkeypox Skin Lesion Dataset. Available from: https://www.kaggle.com/datasets/nafin59/monkeypox-skin-lesion-dataset.
[35]	Mpox Skin Lesion Dataset Version 2.0 (MSLD v2.0). Available from: https://www.kaggle.com/datasets/joydippaul/Mpox-skin-lesion-dataset-version-20-msld-v20.
[36]	M. T. Ribeiro, S. Singh, C. Guestrin, "Why should I trust you?" Explaining the predictions of any classifier, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, (2016), 1135–1144. https://doi.org/10.1145/2939672.2939778
[37]	V. Petsiuk, A. Das, K. Saenko, RISE: randomized input sampling for explanation of black-box models, preprint, arXiv: 1806.07421. https://doi.org/10.48550/arXiv.1806.07421
[38]	W. Samek, A. Binder, G. Montavon, S. Lapuschkin, K. R. Müller, Evaluating the visualization of what a deep neural network has learned, IEEE Trans. Neural Networks Learn. Syst., 28 (2016), 2660–2673. https://doi.org/10.1109/TNNLS.2016.2599820 doi: 10.1109/TNNLS.2016.2599820
[39]	D. A. Melis, T. Jaakkola, Towards robust interpretability with self-explaining neural networks, Adv. Neural Inf. Process. Syst., 31 (2018). https://doi.org/10.48550/arXiv.1806.07538 doi: 10.48550/arXiv.1806.07538
[40]	A. Ghorbani, A. Abid, J. Zou, Interpretation of neural networks is fragile, in Proceedings of the AAAI Conference on Artificial Intelligence, 33 (2019), 3681–3688. https://doi.org/10.1609/aaai.v33i01.33013681
[41]	R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra, Grad-CAM: visual explanations from deep networks via gradient-based localization, in 2017 IEEE International Conference on Computer Vision (ICCV), (2017), 618–626. https://doi.org/10.1109/ICCV.2017.74
[42]	B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, (2016), 2921–2929. https://doi.org/10.1109/CVPR.2016.319
[43]	J. Zhang, S. A. Bargal, Z. Lin, X. Shen, J. Brandt, S. Sclaroff, Top-down neural attention by excitation backprop, Int. J. Comput. Vis., 126 (2018), 1084–1102. https://doi.org/10.1007/s11263-017-1059-x doi: 10.1007/s11263-017-1059-x

Reader Comments

Your name:*

Email:*
© 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)