A probabilistic fusion and meta-logistic calibration model for multiclass hybrid ensemble learning

Khaled Mahmud Sujon; Adnan Shafi; Iftekhar Uddin Ahmed; Wided Bouchelligua; Amel Ksibi; Md Abdus Samad; Khaled Mahmud Sujon; Adnan Shafi; Iftekhar Uddin Ahmed; Wided Bouchelligua; Amel Ksibi; Md Abdus Samad

doi:10.3934/math.2026719

AIMS Mathematics

2026, Volume 11, Issue 6: 17584-17634. doi: 10.3934/math.2026719

Previous Article Next Article

Research article Special Issues

A probabilistic fusion and meta-logistic calibration model for multiclass hybrid ensemble learning

1.
Department of Software Engineering, Faculty of Computing, Universiti Teknologi Malaysia (UTM), Johor Bahru 81310, Johor, Malaysia
2.
Department of Industrial and Systems Engineering, Lamar University, Beaumont 77710, Texas, USA
3.
Faculty of Computing, Universiti Teknologi Malaysia, Johor Bahru 81310, Johor, Malaysia
4.
Applied College, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11432, Saudi Arabia
5.
Department of Information Systems, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P. O. Box 84428, Riyadh 11671, Saudi Arabia
6.
Department of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, South Korea

Received: 16 February 2026 Revised: 30 April 2026 Accepted: 13 May 2026 Published: 17 June 2026
MSC : 62H30, 68T01, 68T05

Multiclass classification in educational data mining presents persistent challenges including class imbalance, miscalibrated probability outputs, and insufficient statistical validation. We proposed RXK-VEM, a hybrid ensemble framework that integrates random forest (RF), extreme gradient boosting (XGBoost), and K-nearest neighbors (KNN) through a formally defined vote-entropy-weighted meta-fusion (VEM) operator, followed by meta-level calibration using multinomial logistic regression. The VEM operator is defined as a mapping on the probability simplex $ \Delta^{C-1} $, aggregating heterogeneous base learner outputs into a unified probabilistic representation with provable closure properties. We further established a Rademacher complexity-based generalization bound showing that operating in the compressed $ C $-dimensional probability space ($ C \ll d $) tightens the generalization gap relative to classifiers trained directly on raw features, providing theoretical justification for the stacking architecture. We validated RXK-VEM on two structurally distinct educational datasets: A primary academic performance dataset ($ N = 560 $, five classes) from Universiti Teknologi Malaysia and a secondary student dropout dataset ($ N = 4,424 $, three classes) from the University of California, Irvine (UCI) repository. On the primary dataset, RXK-VEM achieves 91.07% accuracy, 91.22% precision, and an 86.21% Matthews correlation coefficient (MCC), outperforming all individual base learners and conventional ensemble strategies. On the secondary dataset, the model achieves 77.30% accuracy and a 62.33% MCC, maintaining competitive performance across all metrics. Statistical validation through five-fold stratified cross-validation, paired $ t $-tests, and Wilcoxon signed-rank tests confirms that improvements over weaker baselines are consistent and not attributable to random variation. A systematic ablation study quantifies the complementary contribution of each base learner, and Shapley additive explanations analysis validates the interpretability of the identified predictors. The proposed framework offers a mathematically rigorous, empirically validated, and interpretable architecture for probabilistic ensemble integration in multiclass educational prediction tasks.
- hybrid ensemble learning,
- meta-learning,
- XGBoost,
- educational data mining,
- explainable AI,
- interpretable machine learning,
- statistical validation
Citation: Khaled Mahmud Sujon, Adnan Shafi, Iftekhar Uddin Ahmed, Wided Bouchelligua, Amel Ksibi, Md Abdus Samad. A probabilistic fusion and meta-logistic calibration model for multiclass hybrid ensemble learning[J]. AIMS Mathematics, 2026, 11(6): 17584-17634. doi: 10.3934/math.2026719

Related Papers:

Abstract

Multiclass classification in educational data mining presents persistent challenges including class imbalance, miscalibrated probability outputs, and insufficient statistical validation. We proposed RXK-VEM, a hybrid ensemble framework that integrates random forest (RF), extreme gradient boosting (XGBoost), and K-nearest neighbors (KNN) through a formally defined vote-entropy-weighted meta-fusion (VEM) operator, followed by meta-level calibration using multinomial logistic regression. The VEM operator is defined as a mapping on the probability simplex $ \Delta^{C-1} $, aggregating heterogeneous base learner outputs into a unified probabilistic representation with provable closure properties. We further established a Rademacher complexity-based generalization bound showing that operating in the compressed $ C $-dimensional probability space ($ C \ll d $) tightens the generalization gap relative to classifiers trained directly on raw features, providing theoretical justification for the stacking architecture. We validated RXK-VEM on two structurally distinct educational datasets: A primary academic performance dataset ($ N = 560 $, five classes) from Universiti Teknologi Malaysia and a secondary student dropout dataset ($ N = 4,424 $, three classes) from the University of California, Irvine (UCI) repository. On the primary dataset, RXK-VEM achieves 91.07% accuracy, 91.22% precision, and an 86.21% Matthews correlation coefficient (MCC), outperforming all individual base learners and conventional ensemble strategies. On the secondary dataset, the model achieves 77.30% accuracy and a 62.33% MCC, maintaining competitive performance across all metrics. Statistical validation through five-fold stratified cross-validation, paired $ t $-tests, and Wilcoxon signed-rank tests confirms that improvements over weaker baselines are consistent and not attributable to random variation. A systematic ablation study quantifies the complementary contribution of each base learner, and Shapley additive explanations analysis validates the interpretability of the identified predictors. The proposed framework offers a mathematically rigorous, empirically validated, and interpretable architecture for probabilistic ensemble integration in multiclass educational prediction tasks.

References

[1]	S. A. Sulak, N. Koklu, Predicting student dropout using machine learning algorithms, Intell. Methods Eng. Sci., 3 (2024), 91–98. https://doi.org/10.58190/imiens.2024.103 doi: 10.58190/imiens.2024.103
[2]	S. Deb, M. S. R. Sammy, A. N. Tusher, M. R. S. Sakib, M. F. Hasan, A. I. Aunik, Predicting student dropout: a machine learning approach, 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 2024, 1–7. https://doi.org/10.1109/ICCCNT61001.2024.10726161
[3]	H. D. Huo, J. S. Cui, S. Hein, Z. Padgett, M. Ossolinski, R. Raim, et al., Predicting dropout for nontraditional undergraduate students: a machine learning approach, J. Coll. Stud. Retent.-R., 24 (2023), 1054–1077. https://doi.org/10.1177/1521025120963821 doi: 10.1177/1521025120963821
[4]	S. D. A. Bujang, A. Selamat, R. Ibrahim, O. Krejcar, E. Herrera-Viedma, H. Fujita, et al., Multiclass prediction model for student grade prediction using machine learning, IEEE Access, 9 (2021), 95608–95621. https://doi.org/10.1109/access.2021.3093563 doi: 10.1109/access.2021.3093563
[5]	Q. Lai, Y. You, Frequency-wavelet adaptive basis network for long-term time series forecasting, Eng. Appl. Artif. Intel., 161 (2025), 112161. https://doi.org/10.1016/j.engappai.2025.112161 doi: 10.1016/j.engappai.2025.112161
[6]	Q. Lai, P. Chen, Unveiling node relationships for traffic forecasting: A self-supervised approach with MixGT, Inform. Fusion, 120 (2025), 103070. https://doi.org/10.1016/j.inffus.2025.103070 doi: 10.1016/j.inffus.2025.103070
[7]	G. Y. Tian, Y. Yang, S. P. Wen, Time-series stock price forecasting based on neural networks: A comprehensive survey, Artificial Intelligence Science and Engineering, 1 (2025), 255–277. https://doi.org/10.23919/aise.2025.000018 doi: 10.23919/aise.2025.000018
[8]	Z. Y. Zhu, H. R. Li, Z. C. Wang, X. X. Zhang, Z. W. Tan, Integration of deep learning and improved multi-objective algorithm to optimize cascade reservoirs operation with consideration of ecological dissolved oxygen needs, J. Hydrol., 667 (2026), 134899. https://doi.org/10.1016/j.jhydrol.2025.134899 doi: 10.1016/j.jhydrol.2025.134899
[9]	Z. C. Wang, Z. H. Zhu, H. L. Luan, T. H. Wu, Multi-objective optimal scheduling of cascade reservoirs in complex basin systems: case study of the Jinsha River-Yalong River confluence basin in China, J. Hydrol.-Reg. Stud., 58 (2025), 102240. https://doi.org/10.1016/j.ejrh.2025.102240 doi: 10.1016/j.ejrh.2025.102240
[10]	A. Surya, K. Kumar, M. Kumari, K. Raj, P. Kumar, Student dropout prediction for school education, 2024 1st International Conference on Advances in Computing, Communication and Networking (ICAC2N), Greater Noida, India, 2024,794–800. https://doi.org/10.1109/ICAC2N63387.2024.10895920
[11]	J. M. Porras, J. A. Lara, C. Romero, S. Ventura, A case-study comparison of machine learning approaches for predicting student's dropout from multiple online educational entities, Algorithms, 16 (2023), 554. https://doi.org/10.3390/a16120554 doi: 10.3390/a16120554
[12]	G. Pratape, K. R. Meesala, S. Panda, P. Goyal, Predicting graduation and dropout rates: A machine learning approach, 2023 International Conference on Advanced Computing & Communication Technologies (ICACCTech), Banur, India, 2023,603–609. https://doi.org/10.1109/ICACCTech61146.2023.00103
[13]	K. L. Du, R. G. Zhang, B. C. Jiang, J. Zeng, J. B. Lu, Foundations and innovations in data fusion and ensemble learning for effective consensus, Mathematics, 13 (2025), 587. https://doi.org/10.3390/math13040587 doi: 10.3390/math13040587
[14]	J. A. Talamás-Carvajal, H. G. Ceballos, A stacking ensemble machine learning method for early identification of students at risk of dropout, Educ. Inf. Technol., 28 (2023), 12169–12189. https://doi.org/10.1007/s10639-023-11682-z doi: 10.1007/s10639-023-11682-z
[15]	J. Niyogisubizo, L. C. Liao, E. Nziyumva, E. Murwanashyaka, P. C. Nshimyumukiza, Predicting student's dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization, Computers and Education Artificial Intelligence, 3 (2022), 100066. https://doi.org/10.1016/j.caeai.2022.100066 doi: 10.1016/j.caeai.2022.100066
[16]	E. E. Osemwegie, F. I. Amadin, O. M. Uduehi, Student dropout prediction using machine learning, FUDMA Journal of Sciences, 7 (2023), 347–353. https://doi.org/10.33003/fjs-2023-0706-2103 doi: 10.33003/fjs-2023-0706-2103
[17]	W. Kuntintara, P. Warabuntaweesuk, S. Rattapasakorn, Student dropout prediction using machine learning, 2024 9th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand, 2024, 0229–0233. https://doi.org/10.1109/icbir61386.2024.10875840
[18]	J. Kabathova, M. Drlík, Towards predicting student's dropout in university courses using different machine learning techniques, Appl. Sci., 11 (2021), 3130. https://doi.org/10.3390/APP11073130 doi: 10.3390/APP11073130
[19]	E. Balraj, P. Manikandan, P. Sakthivel, A simple framework for predicting student dropout analysis using data mining algorithms, 2025 International Conference on Visual Analytics and Data Visualization (ICVADV), Tirunelveli, India, 2025, 81–87. https://doi.org/10.1109/ICVADV63329.2025.10961937
[20]	M. Vaarma, H. X. Li, Predicting student dropouts with machine learning: an empirical study in Finnish higher education, Technol. Soc., 76 (2024), 102474. https://doi.org/10.1016/j.techsoc.2024.102474 doi: 10.1016/j.techsoc.2024.102474
[21]	I. Eegdeman, I. Cornelisz, C. van Klaveren, M. Meeter, Computer or teacher: Who predicts dropout best, Front. Educ., 7 (2022), 976922. https://doi.org/10.3389/feduc.2022.976922 doi: 10.3389/feduc.2022.976922
[22]	V. Realinho, J. Machado, L. M. T. Baptista, M. V. Martins, Predicting student dropout and academic success, Data, 7 (2022), 146. https://doi.org/10.3390/data7110146 doi: 10.3390/data7110146
[23]	N. Shynarbek, A. Saduakassova, N. Sagyndyk, Y. Saparzhanov, A. Orynbassar, Forecasting dropout in university based on students' background profile data through automated machine learning approach, 2022 International Conference on Smart Information Systems and Technologies (SIST), Nur-Sultan, Kazakhstan, 2022, 1–5. https://doi.org/10.1109/SIST54437.2022.9945715
[24]	K. M. Sujon, R. Hassan, A. R. Khairudin, S. H. Moi, M. L. M. Shafie, Z. Saringat, et al., The effects of imbalanced datasets on machine learning algorithms in predicting student performance, JOIV International Journal on Informatics Visualization, 8 (2024), 1599–1605. https://doi.org/10.62527/joiv.8.3-2.2449 doi: 10.62527/joiv.8.3-2.2449
[25]	V. Realinho, M. Vieira Martins, J. Machado, L. Baptista, Predict students' dropout and academic success, UCI Machine Learning Repository, 2021. https://doi.org/10.24432/C5MC89
[26]	K. M. Sujon, R. B. Hassan, Z. T. Towshi, M. A. Othman, M. A. Samad, K. Choi, When to use standardization and normalization: empirical evidence from machine learning models and XAI, IEEE Access, 12 (2024), 135300–135314. https://doi.org/10.1109/access.2024.3462434 doi: 10.1109/access.2024.3462434
[27]	K. M. Sujon, R. Hassan, N. Jahan, Synthetic minority over-sampling technique for student performance prediction: A comparative analysis of ensemble and linear models, 2024 27th International Conference on Computer and Information Technology (ICCIT), Cox's Bazar, Bangladesh, 2024, 2231–2236. https://doi.org/10.1109/iccit64611.2024.11022420
[28]	R. S. Baker, T. Martin, L. M. Rossi, Educational data mining and learning analytics, In: The Wiley handbook of cognition and assessment: Frameworks, methodologies, and applications, Hoboken: John Wiley & Sons, 2016,379–396. https://doi.org/10.1002/9781118956588.ch16
[29]	H. Karamti, R. Alharthi, A. A. Anizi, R. M. Alhebshi, A. Eshmawi, S. Alsubai, et al., Improving prediction of cervical cancer using KNN imputed SMOTE features and multi-model ensemble learning approach, Cancers, 15 (2023), 4412. https://doi.org/10.3390/cancers15174412 doi: 10.3390/cancers15174412
[30]	H. L. Zheng, S. W. A. Sherazi, J. Y. Lee, A stacking ensemble prediction model for the occurrences of major adverse cardiovascular events in patients with acute coronary syndrome on imbalanced data, IEEE Access, 9 (2021), 113692–113704. https://doi.org/10.1109/access.2021.3099795 doi: 10.1109/access.2021.3099795
[31]	M. Dubey, J. Tembhurne, R. Makhijani, Improving coronary heart disease prediction with real-life dataset: a stacked generalization framework with maximum clinical attributes and SMOTE balancing for imbalanced data, Multimed. Tools Appl., 83 (2024), 85139–85168. https://doi.org/10.1007/s11042-024-19429-9 doi: 10.1007/s11042-024-19429-9
[32]	J. Demšar, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7 (2006), 1–30. https://jmlr.org/papers/v7/demsar06a.html
[33]	S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions, 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 2017, 1–10.
[34]	Y. S. Balcıoğlu, M. Artar, Predicting academic performance of students with machine learning, Inform. Dev., 41 (2025), 896–915. https://doi.org/10.1177/02666669231213023 doi: 10.1177/02666669231213023
[35]	S. Garg, A. Aleem, M. M. Gore, Employing deep neural network for early prediction of students' performance, In: Intelligent systems, Singapore: Springer, 2021,497–507. https://doi.org/10.1007/978-981-33-6081-5_44
[36]	A. Korchi, F. Messaoudi, A. Abatal, Y. Manzali, Machine learning and deep learning-based students' grade prediction, Oper. Res. Forum, 4 (2023), 87. https://doi.org/10.1007/s43069-023-00267-8 doi: 10.1007/s43069-023-00267-8
[37]	A. Moubayed, M. Injadat, N. Alhindawi, G. Samara, S. Abuasal, R. Alazaidah, A deep learning approach towards student performance prediction in online courses: challenges based on a global perspective, 2023 24thInternational Arab Conference on Information Technology (ACIT), Ajman, United Arab Emirates, 2023, 01–06. https://doi.org/10.1109/acit58888.2023.10453917
[38]	Y. Alshamaila, H. Alsawalqah, I. Aljarah, M. Habib, H. Faris, M. Alshraideh, et al., An automatic prediction of students' performance to support the university education system: A deep learning approach, Multimed. Tools Appl., 83 (2024), 46369–46396. https://doi.org/10.1007/s11042-024-18262-4 doi: 10.1007/s11042-024-18262-4

Reader Comments

Your name:*

Email:*
© 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)