Forward Supervised Discretization for Multivariate with Categorical Responses

Wenxue Huang; Qitian Qiu; Wenxue Huang; Qitian Qiu

doi:10.3934/bdia.2016005

Big Data and Information Analytics

2016, Volume 1, Issue 2: 217-225. doi: 10.3934/bdia.2016005

Previous Article Next Article

Forward Supervised Discretization for Multivariate with Categorical Responses

Wenxue Huang ,
Qitian Qiu

School of Mathematics and Information Science, Guangzhou University Guangzhou, Guangdong 510006, China

Received: 01 April 2016 Revised: 01 September 2016 Published: 01 July 2016

Given a data set with one categorical response variable and multiple categorical or continuous explanatory variables, it is required in some applications to discretize the continuous explanatory ones. A proper supervised discretization usually achieves a better result than the unsupervised ones. Rather than individually doing so as recently proposed by Huang, Pan and Wu in[12, 13], we suggest a forward supervised discretization algorithm to capture a higher association from the multiple explanatory variables to the response variable. Experiments with the GK-tau and the GK-lambda are presented to support the statement.
- Categorical data,
- the GK-λ,
- the GK-τ,
- forward supervise discretization,
- independent supervised discretization
Citation: Wenxue Huang, Qitian Qiu. Forward Supervised Discretization for Multivariate with Categorical Responses[J]. Big Data and Information Analytics, 2016, 1(2): 217-225. doi: 10.3934/bdia.2016005

Related Papers:

Abstract

Given a data set with one categorical response variable and multiple categorical or continuous explanatory variables, it is required in some applications to discretize the continuous explanatory ones. A proper supervised discretization usually achieves a better result than the unsupervised ones. Rather than individually doing so as recently proposed by Huang, Pan and Wu in[12, 13], we suggest a forward supervised discretization algorithm to capture a higher association from the multiple explanatory variables to the response variable. Experiments with the GK-tau and the GK-lambda are presented to support the statement.

References

[1]	[ M. Boulle, Khiops:A statistical discretization method of continuous attributes, Machine Learning, 55(2004), 53-69.
[2]	[ J. Catlett, On changing continuous attributes into ordered discrete attributes, In:Machine LearningEWSL-91, 482(1991), 164-178.
[3]	[ D. Chiu, B. Cheung and A. Wong, Information synthesis based on hierarchical maximum entropy discretization, Journal of Experimental and Theoretical Artificial Intelligence, 2(1989), 117-129.
[4]	[ M. Chmielewski and J. Grzymala-Busse, Global discretization of continuous attributes as preprocessing for machine learning, International Journal of Approximate Reasoning, 15(1996), 319-331.
[5]	[ J. Dougherty, R. Kohavi and M. Sahami, Supervised and unsupervised discretization of continuous features, In Machine learning-International Workshop. Morgan Kaufmann Publishers, 2(1995), 194-202.
[6]	[ U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, Proceedings of the International Joint Conference on Uncertainty in AI, 2(1993), 1022-1027.
[7]	[ G. Gan, C. Ma and J. Wu, Data clustering:Theory, algorithms, and applications(ASA-SIAM series on statistics and applied probability), Society for Industrial and Applied Mathematics, 20(2007), xxii+466 pp.
[8]	[ L. Goodman and W. Kruskal, Measures of association for cross classifications, Journal of the American Statistical Association, 49(1954), 732-764.
[9]	[ I. Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection, Applied Physics Letters, 3(2002), 1157-1182.
[10]	[ R. Holte, Very sim1ple classification rules perform well on most commonly used datasets, Machine Learning, 11(1993), 63-90.
[11]	[ W. Huang and Y. Pan, On balalncing between optimal and proportional predictions, Big Data and Information Analytics, 1(2016), 129-137.
[12]	[ W. Huang, Y. Pan and J. Wu, Supervised discretization with GK-τ, In Procedia Computer Science, 17(2013), 114-120.
[13]	[ W. Huang, Y. Pan and J. Wu, Supervised discretization with GK-λ, Procedia Computer Science, 30(2014), 75-80.
[14]	[ W. Huang, Y. Shi and X. Wang, A nomminal association matrix with feature selection for categorical data, Communications in Statistics-Theory and Methods, to appear.
[15]	[ R. Kerber, Chimerge:Discretization of numeric attributes, In Proceedings of the tenth national conference on Artificial intelligence.AAAI Press, 1994, 123-128.
[16]	[ S. Kotsiantis and D. Kanellopoulos, Discretization techniques:A recent survey, GESTS International Transactions on Computer Science and Engineering, 32(2006), 47-58.
[17]	[ H. Liu and R. Setiono, Chi2:Feature selection and discretization of numeric attributes, In:Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, 55(1995), 388-391.
[18]	[ C. Lloyd, Statistical Analysis with Missing Data, John Wiley & Sons, Inc. 1987, New York, NY, USA.
[19]	[ J. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1(1967), 281-297.
[20]	[ D. Olson and Y. Shi, Introduction to business data mining, Knowledge and information systems, 2007, McGraw-Hill/Irwin.
[21]	[ I. Rish, An empirical study of the naive bayes classifier, IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001, 41-46.
[22]	[ S. Safavian and D. Landgrebe, A survey of decision tree classifier methodology, IEEE Transactions on Systems, Man and Cybernetics, 21(1991), 660-674.
[23]	[ STATCAN, Survey of Family Expenditures-1996.
[24]	[ K. Ting, Discretization of Continuous-Valued Attributes and Instance-Based Learning, Basser Department of Computer Science,University of Sydney, 1994.

Reader Comments

Your name:*

Email:*
© 2016 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Big Data and Information Analytics

Metrics

Article views(3770) PDF downloads(498) Cited by(1)

Preview PDF

Download XML

Export Citation

Article outline

Show full outline

Figures and Tables

Tables(6)

Big Data and Information Analytics

Forward Supervised Discretization for Multivariate with Categorical Responses

Related Papers:

Abstract

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Catalog

Big Data and Information Analytics

Forward Supervised Discretization for Multivariate with Categorical Responses

Related Papers:

Abstract

References

Reader Comments

通讯作者: 陈斌, bchen63@163.com

Metrics

Figures and Tables

Other Articles By Authors

Related pages

Tools

Export File

Citation

Format

Content

Catalog