Forward Supervised Discretization for Multivariate with Categorical Responses

  • Published: 01 July 2016
  • Primary: 62H20; Secondary: 62F07, 68T30

  • Given a data set with one categorical response variable and multiple categorical or continuous explanatory variables, it is required in some applications to discretize the continuous explanatory ones. A proper supervised discretization usually achieves a better result than the unsupervised ones. Rather than individually doing so as recently proposed by Huang, Pan and Wu in [12,13], we suggest a forward supervised discretization algorithm to capture a higher association from the multiple explanatory variables to the response variable. Experiments with the GK-tau and the GK-lambda are presented to support the statement.

    Citation: Wenxue Huang, Qitian Qiu. Forward Supervised Discretization for Multivariate with Categorical Responses[J]. Big Data and Information Analytics, 2016, 1(2&3): 217-225. doi: 10.3934/bdia.2016005

    Related Papers:

  • Given a data set with one categorical response variable and multiple categorical or continuous explanatory variables, it is required in some applications to discretize the continuous explanatory ones. A proper supervised discretization usually achieves a better result than the unsupervised ones. Rather than individually doing so as recently proposed by Huang, Pan and Wu in [12,13], we suggest a forward supervised discretization algorithm to capture a higher association from the multiple explanatory variables to the response variable. Experiments with the GK-tau and the GK-lambda are presented to support the statement.



    加载中
    [1] Boulle M. (2004) Khiops: A statistical discretization method of continuous attributes. Machine Learning 55: 53-69.
    [2] J. Catlett, On changing continuous attributes into ordered discrete attributes, In: Machine LearningEWSL-91, 482 (1991), 164-178.
    [3] Chiu D., Cheung B., Wong A. (1989) Information synthesis based on hierarchical maximum entropy discretization. Journal of Experimental and Theoretical Artificial Intelligence 2: 117-129.
    [4] Chmielewski M., Grzymala-Busse J. (1996) Global discretization of continuous attributes as preprocessing for machine learning. International Journal of Approximate Reasoning 15: 319-331.
    [5] Dougherty J., Kohavi R., Sahami M. (1995) Supervised and unsupervised discretization of continuous features. Machine learning--International Workshop. Morgan Kaufmann Publishers 2: 194-202.
    [6] Fayyad U., Irani K. (1993) Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the International Joint Conference on Uncertainty in AI 2: 1022-1027.
    [7] G. Gan, C. Ma and J. Wu, Data clustering: Theory, algorithms, and applications(ASA-SIAM series on statistics and applied probability), Society for Industrial and Applied Mathematics, 20 (2007), xxii+466 pp.

    10.1137/1.9780898718348

    MR2331172

    [8] Goodman L., Kruskal W. (1954) Measures of association for cross classifications. Journal of the American Statistical Association 49: 732-764.
    [9] Guyon I., Elisseeff A. (2002) An Introduction to Variable and Feature Selection. Applied Physics Letters 3: 1157-1182.
    [10] Holte R. (1993) Very sim1ple classification rules perform well on most commonly used datasets. Machine Learning 11: 63-90.
    [11] Huang W., Pan Y. (2016) On balalncing between optimal and proportional predictions. Big Data and Information Analytics 1: 129-137.
    [12] Huang W., Pan Y., Wu J. (2013) Supervised discretization with $ GK-τ $. Procedia Computer Science 17: 114-120.
    [13] Huang W., Pan Y., Wu J. (2014) Supervised discretization with $ GK-λ $. Procedia Computer Science 30: 75-80.
    [14] W. Huang, Y. Shi and X. Wang, A nomminal association matrix with feature selection for categorical data, Communications in Statistics -Theory and Methods, to appear.
    [15] R. Kerber, Chimerge: Discretization of numeric attributes, In Proceedings of the tenth national conference on Artificial intelligence. AAAI Press, 1994, 123-128.
    [16] Kotsiantis S., Kanellopoulos D. (2006) Discretization techniques: A recent survey. GESTS International Transactions on Computer Science and Engineering 32: 47-58.
    [17] Liu H., Setiono R. (1995) Chi2: Feature selection and discretization of numeric attributes. Proceedings of the Seventh International Conference on Tools with Artificial Intelligence 55: 388-391.
    [18] C. Lloyd, Statistical Analysis with Missing Data, John Wiley & Sons, Inc. 1987, New York, NY, USA.
    [19] MacQueen J. (1967) Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability 1: 281-297.
    [20] D. Olson and Y. Shi, Introduction to business data mining, Knowledge and information systems, 2007, McGraw-Hill/Irwin.
    [21] Rish I. (2001) An empirical study of the naive bayes classifier. IJCAI 2001 workshop on empirical methods in artificial intelligence : 41-46.
    [22] Safavian S., Landgrebe D. (1991) A survey of decision tree classifier methodology. IEEE Transactions on Systems, Man and Cybernetics 21: 660-674.
    [23] STATCAN, Survey of Family Expenditures - 1996.
    [24] K. Ting, Discretization of Continuous-Valued Attributes and Instance-Based Learning, Basser Department of Computer Science, University of Sydney, 1994.
  • Reader Comments
  • © 2016 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(4140) PDF downloads(499) Cited by(1)

Article outline

Figures and Tables

Tables(6)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog