Export file:


  • RIS(for EndNote,Reference Manager,ProCite)
  • BibTex
  • Text


  • Citation Only
  • Citation and Abstract

Forward Supervised Discretization for Multivariate with Categorical Responses

School of Mathematics and Information Science, Guangzhou University Guangzhou, Guangdong 510006, China

Given a data set with one categorical response variable and multiple categorical or continuous explanatory variables, it is required in some applications to discretize the continuous explanatory ones. A proper supervised discretization usually achieves a better result than the unsupervised ones. Rather than individually doing so as recently proposed by Huang, Pan and Wu in[12, 13], we suggest a forward supervised discretization algorithm to capture a higher association from the multiple explanatory variables to the response variable. Experiments with the GK-tau and the GK-lambda are presented to support the statement.
  Article Metrics

Keywords Categorical data; the GK-λ; the GK-τ; forward supervise discretization; independent supervised discretization

Citation: Wenxue Huang, Qitian Qiu. Forward Supervised Discretization for Multivariate with Categorical Responses. Big Data and Information Analytics, 2016, 1(2): 217-225. doi: 10.3934/bdia.2016005


  • [1] M. Boulle, Khiops:A statistical discretization method of continuous attributes, Machine Learning, 55(2004), 53-69.
  • [2] J. Catlett, On changing continuous attributes into ordered discrete attributes, In:Machine LearningEWSL-91, 482(1991), 164-178.
  • [3] D. Chiu, B. Cheung and A. Wong, Information synthesis based on hierarchical maximum entropy discretization, Journal of Experimental and Theoretical Artificial Intelligence, 2(1989), 117-129.
  • [4] M. Chmielewski and J. Grzymala-Busse, Global discretization of continuous attributes as preprocessing for machine learning, International Journal of Approximate Reasoning, 15(1996), 319-331.
  • [5] J. Dougherty, R. Kohavi and M. Sahami, Supervised and unsupervised discretization of continuous features, In Machine learning-International Workshop. Morgan Kaufmann Publishers, 2(1995), 194-202.
  • [6] U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, Proceedings of the International Joint Conference on Uncertainty in AI, 2(1993), 1022-1027.
  • [7] G. Gan, C. Ma and J. Wu, Data clustering:Theory, algorithms, and applications(ASA-SIAM series on statistics and applied probability), Society for Industrial and Applied Mathematics, 20(2007), xxii+466 pp.
  • [8] L. Goodman and W. Kruskal, Measures of association for cross classifications, Journal of the American Statistical Association, 49(1954), 732-764.
  • [9] I. Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection, Applied Physics Letters, 3(2002), 1157-1182.
  • [10] R. Holte, Very sim1ple classification rules perform well on most commonly used datasets, Machine Learning, 11(1993), 63-90.
  • [11] W. Huang and Y. Pan, On balalncing between optimal and proportional predictions, Big Data and Information Analytics, 1(2016), 129-137.
  • [12] W. Huang, Y. Pan and J. Wu, Supervised discretization with GK-τ, In Procedia Computer Science, 17(2013), 114-120.
  • [13] W. Huang, Y. Pan and J. Wu, Supervised discretization with GK-λ, Procedia Computer Science, 30(2014), 75-80.
  • [14] W. Huang, Y. Shi and X. Wang, A nomminal association matrix with feature selection for categorical data, Communications in Statistics-Theory and Methods, to appear.
  • [15] R. Kerber, Chimerge:Discretization of numeric attributes, In Proceedings of the tenth national conference on Artificial intelligence.AAAI Press, 1994, 123-128.
  • [16] S. Kotsiantis and D. Kanellopoulos, Discretization techniques:A recent survey, GESTS International Transactions on Computer Science and Engineering, 32(2006), 47-58.
  • [17] H. Liu and R. Setiono, Chi2:Feature selection and discretization of numeric attributes, In:Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, 55(1995), 388-391.
  • [18] C. Lloyd, Statistical Analysis with Missing Data, John Wiley & Sons, Inc. 1987, New York, NY, USA.
  • [19] J. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1(1967), 281-297.
  • [20] D. Olson and Y. Shi, Introduction to business data mining, Knowledge and information systems, 2007, McGraw-Hill/Irwin.
  • [21] I. Rish, An empirical study of the naive bayes classifier, IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001, 41-46.
  • [22] S. Safavian and D. Landgrebe, A survey of decision tree classifier methodology, IEEE Transactions on Systems, Man and Cybernetics, 21(1991), 660-674.
  • [23] STATCAN, Survey of Family Expenditures-1996.
  • [24] K. Ting, Discretization of Continuous-Valued Attributes and Instance-Based Learning, Basser Department of Computer Science,University of Sydney, 1994.


Reader Comments

your name: *   your email: *  

Copyright Info: 2016, Wenxue Huang, et al., licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution Licese (http://creativecommons.org/licenses/by/4.0)

Download full text in PDF

Export Citation

Copyright © AIMS Press All Rights Reserved