Forward Supervised Discretization for Multivariate with Categorical Responses

  • Received: 01 April 2016 Revised: 01 September 2016 Published: 01 July 2016
  • Given a data set with one categorical response variable and multiple categorical or continuous explanatory variables, it is required in some applications to discretize the continuous explanatory ones. A proper supervised discretization usually achieves a better result than the unsupervised ones. Rather than individually doing so as recently proposed by Huang, Pan and Wu in[12, 13], we suggest a forward supervised discretization algorithm to capture a higher association from the multiple explanatory variables to the response variable. Experiments with the GK-tau and the GK-lambda are presented to support the statement.

    Citation: Wenxue Huang, Qitian Qiu. Forward Supervised Discretization for Multivariate with Categorical Responses[J]. Big Data and Information Analytics, 2016, 1(2): 217-225. doi: 10.3934/bdia.2016005

    Related Papers:

  • Given a data set with one categorical response variable and multiple categorical or continuous explanatory variables, it is required in some applications to discretize the continuous explanatory ones. A proper supervised discretization usually achieves a better result than the unsupervised ones. Rather than individually doing so as recently proposed by Huang, Pan and Wu in[12, 13], we suggest a forward supervised discretization algorithm to capture a higher association from the multiple explanatory variables to the response variable. Experiments with the GK-tau and the GK-lambda are presented to support the statement.


    加载中
    [1] [ M. Boulle, Khiops:A statistical discretization method of continuous attributes, Machine Learning, 55(2004), 53-69.
    [2] [ J. Catlett, On changing continuous attributes into ordered discrete attributes, In:Machine LearningEWSL-91, 482(1991), 164-178.
    [3] [ D. Chiu, B. Cheung and A. Wong, Information synthesis based on hierarchical maximum entropy discretization, Journal of Experimental and Theoretical Artificial Intelligence, 2(1989), 117-129.
    [4] [ M. Chmielewski and J. Grzymala-Busse, Global discretization of continuous attributes as preprocessing for machine learning, International Journal of Approximate Reasoning, 15(1996), 319-331.
    [5] [ J. Dougherty, R. Kohavi and M. Sahami, Supervised and unsupervised discretization of continuous features, In Machine learning-International Workshop. Morgan Kaufmann Publishers, 2(1995), 194-202.
    [6] [ U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, Proceedings of the International Joint Conference on Uncertainty in AI, 2(1993), 1022-1027.
    [7] [ G. Gan, C. Ma and J. Wu, Data clustering:Theory, algorithms, and applications(ASA-SIAM series on statistics and applied probability), Society for Industrial and Applied Mathematics, 20(2007), xxii+466 pp.
    [8] [ L. Goodman and W. Kruskal, Measures of association for cross classifications, Journal of the American Statistical Association, 49(1954), 732-764.
    [9] [ I. Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection, Applied Physics Letters, 3(2002), 1157-1182.
    [10] [ R. Holte, Very sim1ple classification rules perform well on most commonly used datasets, Machine Learning, 11(1993), 63-90.
    [11] [ W. Huang and Y. Pan, On balalncing between optimal and proportional predictions, Big Data and Information Analytics, 1(2016), 129-137.
    [12] [ W. Huang, Y. Pan and J. Wu, Supervised discretization with GK-τ, In Procedia Computer Science, 17(2013), 114-120.
    [13] [ W. Huang, Y. Pan and J. Wu, Supervised discretization with GK-λ, Procedia Computer Science, 30(2014), 75-80.
    [14] [ W. Huang, Y. Shi and X. Wang, A nomminal association matrix with feature selection for categorical data, Communications in Statistics-Theory and Methods, to appear.
    [15] [ R. Kerber, Chimerge:Discretization of numeric attributes, In Proceedings of the tenth national conference on Artificial intelligence.AAAI Press, 1994, 123-128.
    [16] [ S. Kotsiantis and D. Kanellopoulos, Discretization techniques:A recent survey, GESTS International Transactions on Computer Science and Engineering, 32(2006), 47-58.
    [17] [ H. Liu and R. Setiono, Chi2:Feature selection and discretization of numeric attributes, In:Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, 55(1995), 388-391.
    [18] [ C. Lloyd, Statistical Analysis with Missing Data, John Wiley & Sons, Inc. 1987, New York, NY, USA.
    [19] [ J. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1(1967), 281-297.
    [20] [ D. Olson and Y. Shi, Introduction to business data mining, Knowledge and information systems, 2007, McGraw-Hill/Irwin.
    [21] [ I. Rish, An empirical study of the naive bayes classifier, IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001, 41-46.
    [22] [ S. Safavian and D. Landgrebe, A survey of decision tree classifier methodology, IEEE Transactions on Systems, Man and Cybernetics, 21(1991), 660-674.
    [23] [ STATCAN, Survey of Family Expenditures-1996.
    [24] [ K. Ting, Discretization of Continuous-Valued Attributes and Instance-Based Learning, Basser Department of Computer Science,University of Sydney, 1994.
  • Reader Comments
  • © 2016 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2055) PDF downloads(493) Cited by(1)

Article outline

Figures and Tables

Tables(6)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog