Research article Special Issues

A probabilistic fusion and meta-logistic calibration model for multiclass hybrid ensemble learning

  • Published: 17 June 2026
  • MSC : 62H30, 68T01, 68T05

  • Multiclass classification in educational data mining presents persistent challenges including class imbalance, miscalibrated probability outputs, and insufficient statistical validation. We proposed RXK-VEM, a hybrid ensemble framework that integrates random forest (RF), extreme gradient boosting (XGBoost), and K-nearest neighbors (KNN) through a formally defined vote-entropy-weighted meta-fusion (VEM) operator, followed by meta-level calibration using multinomial logistic regression. The VEM operator is defined as a mapping on the probability simplex $ \Delta^{C-1} $, aggregating heterogeneous base learner outputs into a unified probabilistic representation with provable closure properties. We further established a Rademacher complexity-based generalization bound showing that operating in the compressed $ C $-dimensional probability space ($ C \ll d $) tightens the generalization gap relative to classifiers trained directly on raw features, providing theoretical justification for the stacking architecture. We validated RXK-VEM on two structurally distinct educational datasets: A primary academic performance dataset ($ N = 560 $, five classes) from Universiti Teknologi Malaysia and a secondary student dropout dataset ($ N = 4,424 $, three classes) from the University of California, Irvine (UCI) repository. On the primary dataset, RXK-VEM achieves 91.07% accuracy, 91.22% precision, and an 86.21% Matthews correlation coefficient (MCC), outperforming all individual base learners and conventional ensemble strategies. On the secondary dataset, the model achieves 77.30% accuracy and a 62.33% MCC, maintaining competitive performance across all metrics. Statistical validation through five-fold stratified cross-validation, paired $ t $-tests, and Wilcoxon signed-rank tests confirms that improvements over weaker baselines are consistent and not attributable to random variation. A systematic ablation study quantifies the complementary contribution of each base learner, and Shapley additive explanations analysis validates the interpretability of the identified predictors. The proposed framework offers a mathematically rigorous, empirically validated, and interpretable architecture for probabilistic ensemble integration in multiclass educational prediction tasks.

    Citation: Khaled Mahmud Sujon, Adnan Shafi, Iftekhar Uddin Ahmed, Wided Bouchelligua, Amel Ksibi, Md Abdus Samad. A probabilistic fusion and meta-logistic calibration model for multiclass hybrid ensemble learning[J]. AIMS Mathematics, 2026, 11(6): 17584-17634. doi: 10.3934/math.2026719

    Related Papers:

  • Multiclass classification in educational data mining presents persistent challenges including class imbalance, miscalibrated probability outputs, and insufficient statistical validation. We proposed RXK-VEM, a hybrid ensemble framework that integrates random forest (RF), extreme gradient boosting (XGBoost), and K-nearest neighbors (KNN) through a formally defined vote-entropy-weighted meta-fusion (VEM) operator, followed by meta-level calibration using multinomial logistic regression. The VEM operator is defined as a mapping on the probability simplex $ \Delta^{C-1} $, aggregating heterogeneous base learner outputs into a unified probabilistic representation with provable closure properties. We further established a Rademacher complexity-based generalization bound showing that operating in the compressed $ C $-dimensional probability space ($ C \ll d $) tightens the generalization gap relative to classifiers trained directly on raw features, providing theoretical justification for the stacking architecture. We validated RXK-VEM on two structurally distinct educational datasets: A primary academic performance dataset ($ N = 560 $, five classes) from Universiti Teknologi Malaysia and a secondary student dropout dataset ($ N = 4,424 $, three classes) from the University of California, Irvine (UCI) repository. On the primary dataset, RXK-VEM achieves 91.07% accuracy, 91.22% precision, and an 86.21% Matthews correlation coefficient (MCC), outperforming all individual base learners and conventional ensemble strategies. On the secondary dataset, the model achieves 77.30% accuracy and a 62.33% MCC, maintaining competitive performance across all metrics. Statistical validation through five-fold stratified cross-validation, paired $ t $-tests, and Wilcoxon signed-rank tests confirms that improvements over weaker baselines are consistent and not attributable to random variation. A systematic ablation study quantifies the complementary contribution of each base learner, and Shapley additive explanations analysis validates the interpretability of the identified predictors. The proposed framework offers a mathematically rigorous, empirically validated, and interpretable architecture for probabilistic ensemble integration in multiclass educational prediction tasks.



    加载中


    [1] S. A. Sulak, N. Koklu, Predicting student dropout using machine learning algorithms, Intell. Methods Eng. Sci., 3 (2024), 91–98. https://doi.org/10.58190/imiens.2024.103 doi: 10.58190/imiens.2024.103
    [2] S. Deb, M. S. R. Sammy, A. N. Tusher, M. R. S. Sakib, M. F. Hasan, A. I. Aunik, Predicting student dropout: a machine learning approach, 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 2024, 1–7. https://doi.org/10.1109/ICCCNT61001.2024.10726161
    [3] H. D. Huo, J. S. Cui, S. Hein, Z. Padgett, M. Ossolinski, R. Raim, et al., Predicting dropout for nontraditional undergraduate students: a machine learning approach, J. Coll. Stud. Retent.-R., 24 (2023), 1054–1077. https://doi.org/10.1177/1521025120963821 doi: 10.1177/1521025120963821
    [4] S. D. A. Bujang, A. Selamat, R. Ibrahim, O. Krejcar, E. Herrera-Viedma, H. Fujita, et al., Multiclass prediction model for student grade prediction using machine learning, IEEE Access, 9 (2021), 95608–95621. https://doi.org/10.1109/access.2021.3093563 doi: 10.1109/access.2021.3093563
    [5] Q. Lai, Y. You, Frequency-wavelet adaptive basis network for long-term time series forecasting, Eng. Appl. Artif. Intel., 161 (2025), 112161. https://doi.org/10.1016/j.engappai.2025.112161 doi: 10.1016/j.engappai.2025.112161
    [6] Q. Lai, P. Chen, Unveiling node relationships for traffic forecasting: A self-supervised approach with MixGT, Inform. Fusion, 120 (2025), 103070. https://doi.org/10.1016/j.inffus.2025.103070 doi: 10.1016/j.inffus.2025.103070
    [7] G. Y. Tian, Y. Yang, S. P. Wen, Time-series stock price forecasting based on neural networks: A comprehensive survey, Artificial Intelligence Science and Engineering, 1 (2025), 255–277. https://doi.org/10.23919/aise.2025.000018 doi: 10.23919/aise.2025.000018
    [8] Z. Y. Zhu, H. R. Li, Z. C. Wang, X. X. Zhang, Z. W. Tan, Integration of deep learning and improved multi-objective algorithm to optimize cascade reservoirs operation with consideration of ecological dissolved oxygen needs, J. Hydrol., 667 (2026), 134899. https://doi.org/10.1016/j.jhydrol.2025.134899 doi: 10.1016/j.jhydrol.2025.134899
    [9] Z. C. Wang, Z. H. Zhu, H. L. Luan, T. H. Wu, Multi-objective optimal scheduling of cascade reservoirs in complex basin systems: case study of the Jinsha River-Yalong River confluence basin in China, J. Hydrol.-Reg. Stud., 58 (2025), 102240. https://doi.org/10.1016/j.ejrh.2025.102240 doi: 10.1016/j.ejrh.2025.102240
    [10] A. Surya, K. Kumar, M. Kumari, K. Raj, P. Kumar, Student dropout prediction for school education, 2024 1st International Conference on Advances in Computing, Communication and Networking (ICAC2N), Greater Noida, India, 2024,794–800. https://doi.org/10.1109/ICAC2N63387.2024.10895920
    [11] J. M. Porras, J. A. Lara, C. Romero, S. Ventura, A case-study comparison of machine learning approaches for predicting student's dropout from multiple online educational entities, Algorithms, 16 (2023), 554. https://doi.org/10.3390/a16120554 doi: 10.3390/a16120554
    [12] G. Pratape, K. R. Meesala, S. Panda, P. Goyal, Predicting graduation and dropout rates: A machine learning approach, 2023 International Conference on Advanced Computing & Communication Technologies (ICACCTech), Banur, India, 2023,603–609. https://doi.org/10.1109/ICACCTech61146.2023.00103
    [13] K. L. Du, R. G. Zhang, B. C. Jiang, J. Zeng, J. B. Lu, Foundations and innovations in data fusion and ensemble learning for effective consensus, Mathematics, 13 (2025), 587. https://doi.org/10.3390/math13040587 doi: 10.3390/math13040587
    [14] J. A. Talamás-Carvajal, H. G. Ceballos, A stacking ensemble machine learning method for early identification of students at risk of dropout, Educ. Inf. Technol., 28 (2023), 12169–12189. https://doi.org/10.1007/s10639-023-11682-z doi: 10.1007/s10639-023-11682-z
    [15] J. Niyogisubizo, L. C. Liao, E. Nziyumva, E. Murwanashyaka, P. C. Nshimyumukiza, Predicting student's dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization, Computers and Education Artificial Intelligence, 3 (2022), 100066. https://doi.org/10.1016/j.caeai.2022.100066 doi: 10.1016/j.caeai.2022.100066
    [16] E. E. Osemwegie, F. I. Amadin, O. M. Uduehi, Student dropout prediction using machine learning, FUDMA Journal of Sciences, 7 (2023), 347–353. https://doi.org/10.33003/fjs-2023-0706-2103 doi: 10.33003/fjs-2023-0706-2103
    [17] W. Kuntintara, P. Warabuntaweesuk, S. Rattapasakorn, Student dropout prediction using machine learning, 2024 9th International Conference on Business and Industrial Research (ICBIR), Bangkok, Thailand, 2024, 0229–0233. https://doi.org/10.1109/icbir61386.2024.10875840
    [18] J. Kabathova, M. Drlík, Towards predicting student's dropout in university courses using different machine learning techniques, Appl. Sci., 11 (2021), 3130. https://doi.org/10.3390/APP11073130 doi: 10.3390/APP11073130
    [19] E. Balraj, P. Manikandan, P. Sakthivel, A simple framework for predicting student dropout analysis using data mining algorithms, 2025 International Conference on Visual Analytics and Data Visualization (ICVADV), Tirunelveli, India, 2025, 81–87. https://doi.org/10.1109/ICVADV63329.2025.10961937
    [20] M. Vaarma, H. X. Li, Predicting student dropouts with machine learning: an empirical study in Finnish higher education, Technol. Soc., 76 (2024), 102474. https://doi.org/10.1016/j.techsoc.2024.102474 doi: 10.1016/j.techsoc.2024.102474
    [21] I. Eegdeman, I. Cornelisz, C. van Klaveren, M. Meeter, Computer or teacher: Who predicts dropout best, Front. Educ., 7 (2022), 976922. https://doi.org/10.3389/feduc.2022.976922 doi: 10.3389/feduc.2022.976922
    [22] V. Realinho, J. Machado, L. M. T. Baptista, M. V. Martins, Predicting student dropout and academic success, Data, 7 (2022), 146. https://doi.org/10.3390/data7110146 doi: 10.3390/data7110146
    [23] N. Shynarbek, A. Saduakassova, N. Sagyndyk, Y. Saparzhanov, A. Orynbassar, Forecasting dropout in university based on students' background profile data through automated machine learning approach, 2022 International Conference on Smart Information Systems and Technologies (SIST), Nur-Sultan, Kazakhstan, 2022, 1–5. https://doi.org/10.1109/SIST54437.2022.9945715
    [24] K. M. Sujon, R. Hassan, A. R. Khairudin, S. H. Moi, M. L. M. Shafie, Z. Saringat, et al., The effects of imbalanced datasets on machine learning algorithms in predicting student performance, JOIV International Journal on Informatics Visualization, 8 (2024), 1599–1605. https://doi.org/10.62527/joiv.8.3-2.2449 doi: 10.62527/joiv.8.3-2.2449
    [25] V. Realinho, M. Vieira Martins, J. Machado, L. Baptista, Predict students' dropout and academic success, UCI Machine Learning Repository, 2021. https://doi.org/10.24432/C5MC89
    [26] K. M. Sujon, R. B. Hassan, Z. T. Towshi, M. A. Othman, M. A. Samad, K. Choi, When to use standardization and normalization: empirical evidence from machine learning models and XAI, IEEE Access, 12 (2024), 135300–135314. https://doi.org/10.1109/access.2024.3462434 doi: 10.1109/access.2024.3462434
    [27] K. M. Sujon, R. Hassan, N. Jahan, Synthetic minority over-sampling technique for student performance prediction: A comparative analysis of ensemble and linear models, 2024 27th International Conference on Computer and Information Technology (ICCIT), Cox's Bazar, Bangladesh, 2024, 2231–2236. https://doi.org/10.1109/iccit64611.2024.11022420
    [28] R. S. Baker, T. Martin, L. M. Rossi, Educational data mining and learning analytics, In: The Wiley handbook of cognition and assessment: Frameworks, methodologies, and applications, Hoboken: John Wiley & Sons, 2016,379–396. https://doi.org/10.1002/9781118956588.ch16
    [29] H. Karamti, R. Alharthi, A. A. Anizi, R. M. Alhebshi, A. Eshmawi, S. Alsubai, et al., Improving prediction of cervical cancer using KNN imputed SMOTE features and multi-model ensemble learning approach, Cancers, 15 (2023), 4412. https://doi.org/10.3390/cancers15174412 doi: 10.3390/cancers15174412
    [30] H. L. Zheng, S. W. A. Sherazi, J. Y. Lee, A stacking ensemble prediction model for the occurrences of major adverse cardiovascular events in patients with acute coronary syndrome on imbalanced data, IEEE Access, 9 (2021), 113692–113704. https://doi.org/10.1109/access.2021.3099795 doi: 10.1109/access.2021.3099795
    [31] M. Dubey, J. Tembhurne, R. Makhijani, Improving coronary heart disease prediction with real-life dataset: a stacked generalization framework with maximum clinical attributes and SMOTE balancing for imbalanced data, Multimed. Tools Appl., 83 (2024), 85139–85168. https://doi.org/10.1007/s11042-024-19429-9 doi: 10.1007/s11042-024-19429-9
    [32] J. Demšar, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7 (2006), 1–30. https://jmlr.org/papers/v7/demsar06a.html
    [33] S. M. Lundberg, S.-I. Lee, A unified approach to interpreting model predictions, 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 2017, 1–10.
    [34] Y. S. Balcıoğlu, M. Artar, Predicting academic performance of students with machine learning, Inform. Dev., 41 (2025), 896–915. https://doi.org/10.1177/02666669231213023 doi: 10.1177/02666669231213023
    [35] S. Garg, A. Aleem, M. M. Gore, Employing deep neural network for early prediction of students' performance, In: Intelligent systems, Singapore: Springer, 2021,497–507. https://doi.org/10.1007/978-981-33-6081-5_44
    [36] A. Korchi, F. Messaoudi, A. Abatal, Y. Manzali, Machine learning and deep learning-based students' grade prediction, Oper. Res. Forum, 4 (2023), 87. https://doi.org/10.1007/s43069-023-00267-8 doi: 10.1007/s43069-023-00267-8
    [37] A. Moubayed, M. Injadat, N. Alhindawi, G. Samara, S. Abuasal, R. Alazaidah, A deep learning approach towards student performance prediction in online courses: challenges based on a global perspective, 2023 24thInternational Arab Conference on Information Technology (ACIT), Ajman, United Arab Emirates, 2023, 01–06. https://doi.org/10.1109/acit58888.2023.10453917
    [38] Y. Alshamaila, H. Alsawalqah, I. Aljarah, M. Habib, H. Faris, M. Alshraideh, et al., An automatic prediction of students' performance to support the university education system: A deep learning approach, Multimed. Tools Appl., 83 (2024), 46369–46396. https://doi.org/10.1007/s11042-024-18262-4 doi: 10.1007/s11042-024-18262-4
  • Reader Comments
  • © 2026 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(104) PDF downloads(14) Cited by(0)

Article outline

Figures and Tables

Figures(21)  /  Tables(16)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog