Export file:


  • RIS(for EndNote,Reference Manager,ProCite)
  • BibTex
  • Text


  • Citation Only
  • Citation and Abstract


1Research Scholar, GITAM University, Telangana, Hyderabad, India
2Sambalpur University Institute of Information Technology, Sambalpur Orissa, India

Data stream mining is every popular in recent years with advanced electronic devices generating continuous data streams. The performance of standard learning algorithms has been compromised with imbalance naturepresent in real world data streams. In this paper, we propose an algorithm known as Increment Under Sampling for Data streams (IUSDS) which uses an unique under sampling technique to almost balance the data sets to minimize the effect of imbalance in stream mining process. The experimental analysis conducted suggests that the proposed algorithm improves the knowledge discovery over benchmark algorithms like C4.5 and Hoeffding tree in terms of standard performance measures namely accuracy, AUC, precision, recall, F-measure, TP rate, FP rate and TN rate.
  Article Metrics

Keywords Knowledge Discovery; Data Streams; Imbalanced data; oversampling; Increment Under Sampling for Data Streams (IUSDS)

Citation: Anupama N, Sudarson Jena. A NOVEL APPROACH USING INCREMENTAL UNDER SAMPLING FOR DATA STREAM MINING. Big Data and Information Analytics, 2018, 3(1): 1-13. doi: 10.3934/bdia.2017017


  • [1] A. K. Jain" Data clustering: 50 years beyond K-means", Pattern Recognition Letters 31 (2010) 651–666.
  • [2] Y. Chen, S. Tang, L. Zhou, C. Wang, J. Du, T. Wang, and S. Pei, "Decentralized Clustering by Finding Loose and Distributed Density Cores", Information Sciences, 2016.
  • [3] Alex Rodriguez and Alessandro Laio," Clustering by fast search and find of density peaks, Science 344, 1492 (2014), DOI: 10.1126/science.1242072.
  • [4] Brendan J. Frey and Delbert Dueck, "Clustering by Passing Messages Between Data Points", Science, vol. 315, no. 5814, pp. 972- 976, 2007.
  • [5] Iain Brown , Christophe Mues" An experimental comparison of classification algorithms for imbalanced credit scoring data sets", Expert Systems with Applications 39 2012 3446–3453
  • [6] Ana C. Lorena, Luis F.O. Jacintho, Marinez F. Siqueira, Renato De Giovanni, Lúcia G. Lohmann, André C.P.L.F. de Carvalho, Missae Yamamoto" Comparing machine learning classifiers in potential distribution modelling", Expert Systems with Applications 38 2011 5268–5275.
  • [7] VictoriaLópez, Isaac Triguero, CristóbalJ.Carmona, Salvador García, Francisco Herrera" Addressing imbalanced classification withinstance generation techniques:IPADE-ID",Neurocomputing, 1262014, 15–28",
  • [8] NeleVerbiesta, EnislayRamentol, Chris Cornelisa, Francisco Herrera" Preprocessing noisy imbalanced datasets using SMOTE enhanced withfuzzy rough prototype selection" Applied Soft Computing 22 2014 511–517.
  • [9] Peng Cao, DazheZhaoandOsmarZaiane,"A PSO-based Cost-Sensitive Neural Network for Imbalanced Data Classification",adfa, p. 1, 2011.Springer-Verlag Berlin Heidelberg 2011.
  • [10] Yetian Chen" Learning Classifiers from Imbalanced Only Positive and Unlabeled Data Sets" 2008 UC San Diego Data Mining Contest.
  • [11] Doucette and Malcolm I. Heywood "Classification under Imbalanced Data sets: Active Sub-sampling and AUC Approximation" M. O'Neill et al. Eds.: EuroGP 2008, LNCS 4971, pp. 266–277, 2008. Springer-Verlag Berlin Heidelberg 2008.
  • [12] Aditya Krishna Menon, HarikrishnaNarasimhan, ShivaniAgarwal, Sanjay Chawla" On the Statistical Consistency of Algorithms for Binary Classification under Class Imbalance", Appearing in Proceedings of the 30 thInternational Conference on Machine Learning, Atlanta, Georgia, USA, 2013.
  • [13] Shuo Wang, , Leandro L. Minku, , and Xin Yao," Resampling-Based Ensemble Methods for Online Class Imbalance Learning", DOI 10.1109/TKDE.2014.2345380, IEEE Transactions on Knowledge and Data Engineering IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING.
  • [14] Bing Yang and Ling Jing," A Novel Nonparallel Plane Proximal SVM for Imbalance Data Classification", JOURNAL OF SOFTWARE, VOL. 9, NO. 9, SEPTEMBER 2014.
  • [15] Geoff Hulten, Laurie Spencer, Pedro Domingos" Mining time-changing data streams", In: ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 97-106, 2001.
  • [16] Hall MA (1998) Correlation-based feature subset selection for machine learning. PhD Thesis.
  • [17] Ron Kohavi: Scaling Up the Accuracy of Naive-Bayes Classifiers: A Decision-Tree Hybrid. In: Second International Conference on Knoledge Discovery and Data Mining, 202-207, 1996.
  • [18] J. Alcalá-Fdez, A. Fernandez, J. Luengo, J. Derrac, S. García, L. Sánchez, F. Herrera. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. Journal of Multiple-Valued Logic and Soft Computing 17:2-3 (2011) 255-287.
  • [19] HamiltonA. Asuncion D. Newman. (2007). UCI Repository of Machine Learning Database (School of Information and Computer Science), Irvine, CA: Univ. of California [Online]. Available: http://www.ics.uci.edu/mlearn/MLRepository.html
  • [20] Witten, I.H. and Frank, E. (2005) Data Mining: Practical machine learning tools and techniques. 2nd edition Morgan Kaufmann, San Francisco.


Reader Comments

your name: *   your email: *  

© 2018 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution Licese (http://creativecommons.org/licenses/by/4.0)

Download full text in PDF

Export Citation

Copyright © AIMS Press All Rights Reserved