Classification of imbalanced data streams in nonstationary environments poses a significant challenge in machine learning. Ensemble learning has demonstrated efficacy in managing imbalanced data streams and concept drift. However, most existing methods develop different strategies for each aspect, overlooking their relationships and interactions, which hinders the expected performance of these strategies. To address this issue, an adaptive dual dynamic ensemble selection (AD-DES) method was proposed for classifying imbalanced data streams with concept drift. First, an adaptive equalization resampling (AER) strategy was proposed to obtain balanced data chunks, which can help to reduce the risk of overfitting or insufficient sampling caused by excessive data imbalance. Following that, the data chunk obtained by the AER strategy was stored to balance the subsequent ones. Second, a dual dynamic ensemble selection (D-DES) strategy was introduced to perform two rounds of selection in the classifier pool to obtain the optimal ensemble model. Finally, an adaptive drift detector in AD-DES was integrated, which is beneficial for the model to adjust to newly emerging concepts in nonstationary environments. The research findings demonstrated that AD-DES outshines 9 comparison algorithms in terms of classification precision and robustness across 10 synthetic datasets and 5 real-world datasets featuring diverse forms of concept drift.
Citation: Ziyan Mo, Li Deng, Bo Wei, Jiakai Chen, Aixi Chen. AD-DES: An adaptive dual dynamic ensemble selection for imbalanced data streams[J]. Electronic Research Archive, 2025, 33(11): 6577-6609. doi: 10.3934/era.2025291
Classification of imbalanced data streams in nonstationary environments poses a significant challenge in machine learning. Ensemble learning has demonstrated efficacy in managing imbalanced data streams and concept drift. However, most existing methods develop different strategies for each aspect, overlooking their relationships and interactions, which hinders the expected performance of these strategies. To address this issue, an adaptive dual dynamic ensemble selection (AD-DES) method was proposed for classifying imbalanced data streams with concept drift. First, an adaptive equalization resampling (AER) strategy was proposed to obtain balanced data chunks, which can help to reduce the risk of overfitting or insufficient sampling caused by excessive data imbalance. Following that, the data chunk obtained by the AER strategy was stored to balance the subsequent ones. Second, a dual dynamic ensemble selection (D-DES) strategy was introduced to perform two rounds of selection in the classifier pool to obtain the optimal ensemble model. Finally, an adaptive drift detector in AD-DES was integrated, which is beneficial for the model to adjust to newly emerging concepts in nonstationary environments. The research findings demonstrated that AD-DES outshines 9 comparison algorithms in terms of classification precision and robustness across 10 synthetic datasets and 5 real-world datasets featuring diverse forms of concept drift.
| [1] |
J. Gama, I. Žliobaitė, A. Bifet, M. Pechenizkiy, A. Bouchachia, A survey on concept drift adaptation, ACM Comput. Surv., 46 (2014), 1–37. https://doi.org/10.1145/2523813 doi: 10.1145/2523813
|
| [2] |
S. Wang, L. L. Minku, X. Yao, A systematic study of online class imbalance learning with concept drift, IEEE Trans. Neural Networks Learn. Syst., 29 (2018), 4802–4821. https://doi.org/10.1109/TNNLS.2017.2771290 doi: 10.1109/TNNLS.2017.2771290
|
| [3] |
G. Ditzler, M. Roveri, C. Alippi, R. Polikar, Learning in nonstationary environments: A survey, IEEE Comput. Intell. Mag., 10 (2015), 12–25. https://doi.org/10.1109/MCI.2015.2471196 doi: 10.1109/MCI.2015.2471196
|
| [4] |
H. He, E. A. Garcia, Learning from imbalanced data, IEEE Trans. Knowl. Data Eng., 21 (2009), 1263–1284. https://doi.org/10.1109/TKDE.2008.239 doi: 10.1109/TKDE.2008.239
|
| [5] | X. Guo, Y. Yin, C. Dong, G. Yang, G. Zhou, On the class imbalance problem, in 2008 Fourth International Conference on Natural Computation, IEEE, 4 (2008), 192–201. https://doi.org/10.1109/ICNC.2008.871 |
| [6] |
X. Wang, Q. Kang, M. Zhou, L. Pan, A. Abusorrah, Multiscale drift detection test to enable fast learning in nonstationary environments, IEEE Trans. Cybern., 51 (2020), 3483–3495. https://doi.org/10.1109/TCYB.2020.2989213 doi: 10.1109/TCYB.2020.2989213
|
| [7] |
B. Wei, J. Chen, L. Deng, Z. Mo, M. Jiang, F. Wang, Adaptive bagging-based dynamic ensemble selection in nonstationary environments, Expert Syst. Appl., 255 (2024), 124860. https://doi.org/10.1016/j.eswa.2024.124860 doi: 10.1016/j.eswa.2024.124860
|
| [8] | A. Pesaranghader, H. L. Viktor, E. Paquet, Mcdiarmid drift detection methods for evolving data streams, in 2018 International Joint Conference on Neural Networks (IJCNN), IEEE, (2018), 1–9. https://doi.org/10.1109/IJCNN.2018.8489260 |
| [9] | S. Wang, L. L. Minku, D. Ghezzi, D. Caltabiano, P. Tino, X. Yao, Concept drift detection for online class imbalance learning, in The 2013 International Joint Conference on Neural Networks (IJCNN), IEEE, (2013), 1–10. https://doi.org/10.1109/IJCNN.2013.6706768 |
| [10] |
S. Ren, B. Liao, W. Zhu, K. Li, Knowledge-maximized ensemble algorithm for different types of concept drift, Inf. Sci., 430 (2018), 261–281. https://doi.org/10.1016/j.ins.2017.11.046 doi: 10.1016/j.ins.2017.11.046
|
| [11] |
J. Klikowski, M. Woźniak, Deterministic sampling classifier with weighted bagging for drifted imbalanced data stream classification, Appl. Soft. Comput., 122 (2022), 108855. https://doi.org/10.1016/j.asoc.2022.108855 doi: 10.1016/j.asoc.2022.108855
|
| [12] |
I. Czarnowski, Weighted ensemble with one-class classification and over-sampling and instance selection (wecoi): An approach for learning from imbalanced data streams, J. Comput. Sci., 61 (2022), 101614. https://doi.org/10.1016/j.jocs.2022.101614 doi: 10.1016/j.jocs.2022.101614
|
| [13] |
R. M. Cruz, R. Sabourin, G. D. Cavalcanti, Dynamic classifier selection: Recent advances and perspectives, Inf. Fusion, 41 (2018), 195–216. https://doi.org/10.1016/j.inffus.2017.09.010 doi: 10.1016/j.inffus.2017.09.010
|
| [14] |
A. H. Ko, R. Sabourin, A. S. Britto Jr, From dynamic classifier selection to dynamic ensemble selection, Pattern Recognit., 41 (2008), 1718–1731. https://doi.org/10.1016/j.patcog.2007.10.015 doi: 10.1016/j.patcog.2007.10.015
|
| [15] |
B. Jiao, Y. Guo, D. Gong, Q. Chen, Dynamic ensemble selection for imbalanced data streams with concept drift, IEEE Trans. Neural Networks Learn. Syst., 35 (2022), 1278–1291. https://doi.org/10.1109/TNNLS.2022.3183120 doi: 10.1109/TNNLS.2022.3183120
|
| [16] |
J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, G. Zhang, Learning under concept drift: A review, IEEE Trans. Knowl. Data Eng., 31 (2018), 2346–2363. https://doi.org/10.1109/TKDE.2018.2876857 doi: 10.1109/TKDE.2018.2876857
|
| [17] | T. R. Hoens, N. V. Chawla, Learning in non-stationary environments with class imbalance, in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, (2012), 168–176. https://doi.org/10.1145/2339530.2339558 |
| [18] |
Y. Lu, Y. M. Cheung, Y. Y. Tang, Adaptive chunk-based dynamic weighted majority for imbalanced data streams with concept drift, IEEE Trans. Neural Networks Learn. Syst., 31 (2019), 2764–2778. https://doi.org/10.1109/TNNLS.2019.2951814 doi: 10.1109/TNNLS.2019.2951814
|
| [19] |
W. Liu, H. Zhang, Z. Ding, Q. Liu, C. Zhu, A comprehensive active learning method for multiclass imbalanced data streams with concept drift, Knowl.-Based Syst., 215 (2021), 106778. https://doi.org/10.1016/j.knosys.2021.106778 doi: 10.1016/j.knosys.2021.106778
|
| [20] |
N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, Smote: synthetic minority over-sampling technique, J. Artif. Intell. Res., 16 (2002), 321–357. https://doi.org/10.1613/jair.953 doi: 10.1613/jair.953
|
| [21] | H. Han, W. Y. Wang, B. H. Mao, Borderline-smote: a new over-sampling method in imbalanced data sets learning, in International Conference on Intelligent Computing, Springer, (2005), 878–887. https://doi.org/10.1007/11538059_91 |
| [22] | C. Bunkhumpornpat, K. Sinapiromsaran, C. Lursinsap, Safe-level-smote: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem, in Advances in Knowledge Discovery and Data Mining. PAKDD 2009. Lecture Notes in Computer Science(), Springer, Berlin, Heidelberg, 5476 (2009), 475–482. https://doi.org/10.1007/978-3-642-01307-2_43 |
| [23] | H. He, Y. Bai, E. A. Garcia, S. Li, Adasyn: Adaptive synthetic sampling approach for imbalanced learning, in 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), IEEE, (2008), 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969 |
| [24] |
P. Soltanzadeh, M. Hashemzadeh, Rcsmote: Range-controlled synthetic minority over-sampling technique for handling the class imbalance problem, Inf. Sci., 542 (2021), 92–111. https://doi.org/10.1016/j.ins.2020.07.014 doi: 10.1016/j.ins.2020.07.014
|
| [25] |
J. Li, Q. Zhu, Q. Wu, Z. Fan, A novel oversampling technique for class-imbalanced learning based on smote and natural neighbors, Inf. Sci., 565 (2021), 438–455. https://doi.org/10.1016/j.ins.2021.03.041 doi: 10.1016/j.ins.2021.03.041
|
| [26] | M. A. Tahir, J. Kittler, K. Mikolajczyk, F. Yan, A multiple expert approach to the class imbalance problem using inverse random under sampling, in Multiple Classifier Systems. MCS 2009. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, 5519 (2009), 82–91. https://doi.org/10.1007/978-3-642-02326-2_9 |
| [27] |
N. S. Kumar, K. N. Rao, A. Govardhan, K. S. Reddy, A. M. Mahmood, Undersampled k-means approach for handling imbalanced distributed data, Prog. Artif. Intell., 3 (2014), 29–38. https://doi.org/10.1007/s13748-014-0045-6 doi: 10.1007/s13748-014-0045-6
|
| [28] |
W. W. Ng, J. Hu, D. S. Yeung, S. Yin, F. Roli, Diversified sensitivity-based undersampling for imbalance classification problems, IEEE Trans. Cybern., 45 (2014), 2402–2412. https://doi.org/10.1109/TCYB.2014.2372060 doi: 10.1109/TCYB.2014.2372060
|
| [29] |
W. C. Lin, C. F. Tsai, Y. H. Hu, J. S. Jhang, Clustering-based undersampling in class-imbalanced data, Inf. Sci., 409 (2017), 17–26. https://doi.org/10.1016/j.ins.2017.05.008 doi: 10.1016/j.ins.2017.05.008
|
| [30] |
C. F. Tsai, W. C. Lin, Y. H. Hu, G. T. Yao, Under-sampling class imbalanced datasets by combining clustering analysis and instance selection, Inf. Sci., 477 (2019), 47–54. https://doi.org/10.1016/j.ins.2018.10.029 doi: 10.1016/j.ins.2018.10.029
|
| [31] |
J. Ren, Y. Wang, M. Mao, Y. M. Cheung, Equalization ensemble for large scale highly imbalanced data classification, Knowl.-Based Syst., 242 (2022), 108295. https://doi.org/10.1016/j.knosys.2022.108295 doi: 10.1016/j.knosys.2022.108295
|
| [32] |
B. Krawczyk, L. L. Minku, J. Gama, J. Stefanowski, M. Woźniak, Ensemble learning for data stream analysis: A survey, Inf. Fusion, 37 (2017), 132–156. https://doi.org/10.1016/j.inffus.2017.02.004 doi: 10.1016/j.inffus.2017.02.004
|
| [33] |
M. Woźniak, P. Zyblewski, P. Ksieniewicz, Active weighted aging ensemble for drifted data stream classification, Inf. Sci., 630 (2023), 286–304. https://doi.org/10.1016/j.ins.2023.02.046 doi: 10.1016/j.ins.2023.02.046
|
| [34] | H. Wang, Z. Abraham, Concept drift detection for streaming data, in 2015 International Joint Conference on Neural Networks (IJCNN), IEEE, Killarney, (2015), 1–9. https://doi.org/10.1109/IJCNN.2015.7280398 |
| [35] |
S. Wang, L. L. Minku, X. Yao, Resampling-based ensemble methods for online class imbalance learning, IEEE Trans. Knowl. Data Eng., 27 (2014), 1356–1368. https://doi.org/10.1109/TKDE.2014.2345380 doi: 10.1109/TKDE.2014.2345380
|
| [36] | A. Bernardo, H. M. Gomes, J. Montiel, B. Pfahringer, A. Bifet, E. D. Valle, C-smote: Continuous synthetic minority oversampling for evolving data streams, in 2020 IEEE International Conference on Big Data (Big Data), IEEE, (2020), 483–492. https://doi.org/10.1109/BigData50022.2020.9377768 |
| [37] |
K. Malialis, C. G. Panayiotou, M. M. Polycarpou, Online learning with adaptive rebalancing in nonstationary environments, IEEE Trans. Neural Networks Learn. Syst., 32 (2020), 4445–4459. https://doi.org/10.1109/TNNLS.2020.3017863 doi: 10.1109/TNNLS.2020.3017863
|
| [38] |
A. Cano, B. Krawczyk, Rose: robust online self-adjusting ensemble for continual learning on imbalanced drifting data streams, Mach. Learn., 111 (2022), 2561–2599. https://doi.org/10.1007/s10994-022-06168-x doi: 10.1007/s10994-022-06168-x
|
| [39] |
G. Ditzler, R. Polikar, Incremental learning of concept drift from streaming imbalanced data, IEEE Trans. Knowl. Data Eng., 25 (2012), 2283–2301. https://doi.org/10.1109/TKDE.2012.136 doi: 10.1109/TKDE.2012.136
|
| [40] |
H. M. Gomes, A. Bifet, J. Read, J. P. Barddal, F. Enembreck, B. Pfharinger, et al., Adaptive random forests for evolving data stream classification, Mach. Learn., 106 (2017), 1469–1495. https://doi.org/10.1007/s10994-017-5642-8 doi: 10.1007/s10994-017-5642-8
|
| [41] |
P. Zyblewski, R. Sabourin, M. Woźniak, Preprocessed dynamic classifier ensemble selection for highly imbalanced drifted data streams, Inf. Fusion, 66 (2021), 138–154. https://doi.org/10.1016/j.inffus.2020.09.004 doi: 10.1016/j.inffus.2020.09.004
|
| [42] |
R. Zhu, Y. Guo, J. H. Xue, Adjusting the imbalance ratio by the dimensionality of imbalanced data, Pattern Recognit. Lett., 133 (2020), 217–223. https://doi.org/10.1016/j.patrec.2020.03.004 doi: 10.1016/j.patrec.2020.03.004
|
| [43] |
R. Kumari, J. Singh, A. Gosain, Impact of class imbalance ratio on ensemble methods for imbalance problem: A new perspective, J. Intell. Fuzzy Syst., 45 (2023), 10823–10834. https://doi.org/10.3233/JIFS-223333 doi: 10.3233/JIFS-223333
|
| [44] | J. Gama, P. Medas, G. Castillo, P. Rodrigues, Learning with drift detection, in Advances in Artificial Intelligence – SBIA 2004. SBIA 2004. Lecture Notes in Computer Science(), Springer, Berlin, Heidelberg, 3171 (2004), 286–295. https://doi.org/10.1007/978-3-540-28645-5_29 |
| [45] | W. N. Street, Y. Kim, A streaming ensemble algorithm (sea) for large-scale classification, in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, (2001), 377–382. https://doi.org/10.1145/502512.502568 |
| [46] | Y. Lu, Y. M. Cheung, Y. Y. Tang, Dynamic weighted majority for incremental learning of imbalanced data streams with concept drift, in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, (2017), 2393–2399. https://doi.org/10.24963/ijcai.2017/333 |
| [47] | G. Hulten, L. Spencer, P. Domingos, Mining time-changing data streams, in Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, (2001), 97–106. https://doi.org/10.1145/502512.502529 |
| [48] | J. Z. Kolter, M. A. Maloof, Dynamic weighted majority: An ensemble method for drifting concepts, J. Mach. Learn. Res., 8 (2007), 2755–2790. https://jmlr.org/papers/v8/kolter07a.html |
| [49] |
A. Liu, J. Lu, G. Zhang, Diverse instance-weighting ensemble based on region drift disagreement for concept drift adaptation, IEEE Trans. Neural Networks Learn. Syst., 32 (2020), 293–307. https://doi.org/10.1109/TNNLS.2020.2978523 doi: 10.1109/TNNLS.2020.2978523
|
| [50] |
R. Elwell, R. Polikar, Incremental learning of concept drift in nonstationary environments, IEEE Trans. Neural Networks, 22 (2011), 1517–1531. https://doi.org/10.1109/TNN.2011.2160459 doi: 10.1109/TNN.2011.2160459
|
| [51] |
M. Usman, H. Chen, Emril: Ensemble method based on reinforcement learning for binary classification in imbalanced drifting data streams, Neurocomputing, 605 (2024), 128259. https://doi.org/10.1016/j.neucom.2024.128259 doi: 10.1016/j.neucom.2024.128259
|
| [52] |
H. Zhang, W. Liu, S. Wang, J. Shan, Q. Liu, Resample-based ensemble framework for drifting imbalanced data streams, IEEE Access, 7 (2019), 65103–65115. https://doi.org/10.1109/ACCESS.2019.2914725 doi: 10.1109/ACCESS.2019.2914725
|
| [53] | J. Montiel, J. Read, A. Bifet, T. Abdessalem, Scikit-multiflow: A multi-output streaming framework, J. Mach. Learn. Res., 19 (2018), 1–5. |