Forward Supervised Discretization for Multivariate with Categorical Responses

  • Received: 01 April 2016 Revised: 01 September 2016 Published: 01 July 2016
  • Given a data set with one categorical response variable and multiple categorical or continuous explanatory variables, it is required in some applications to discretize the continuous explanatory ones. A proper supervised discretization usually achieves a better result than the unsupervised ones. Rather than individually doing so as recently proposed by Huang, Pan and Wu in[12, 13], we suggest a forward supervised discretization algorithm to capture a higher association from the multiple explanatory variables to the response variable. Experiments with the GK-tau and the GK-lambda are presented to support the statement.

    Citation: Wenxue Huang, Qitian Qiu. Forward Supervised Discretization for Multivariate with Categorical Responses[J]. Big Data and Information Analytics, 2016, 1(2): 217-225. doi: 10.3934/bdia.2016005

    Related Papers:

    [1] Ming Yang, Dunren Che, Wen Liu, Zhao Kang, Chong Peng, Mingqing Xiao, Qiang Cheng . On identifiability of 3-tensors of multilinear rank (1; Lr; Lr). Big Data and Information Analytics, 2016, 1(4): 391-401. doi: 10.3934/bdia.2016017
    [2] Subrata Dasgupta . Disentangling data, information and knowledge. Big Data and Information Analytics, 2016, 1(4): 377-390. doi: 10.3934/bdia.2016016
    [3] Guojun Gan, Qiujun Lan, Shiyang Sima . Scalable Clustering by Truncated Fuzzy c-means. Big Data and Information Analytics, 2016, 1(2): 247-259. doi: 10.3934/bdia.2016007
    [4] Ugo Avila-Ponce de León, Ángel G. C. Pérez, Eric Avila-Vales . A data driven analysis and forecast of an SEIARD epidemic model for COVID-19 in Mexico. Big Data and Information Analytics, 2020, 5(1): 14-28. doi: 10.3934/bdia.2020002
    [5] Wenxue Huang, Yuanyi Pan . On Balancing between Optimal and Proportional categorical predictions. Big Data and Information Analytics, 2016, 1(1): 129-137. doi: 10.3934/bdia.2016.1.129
    [6] Zhouchen Lin . A Review on Low-Rank Models in Data Analysis. Big Data and Information Analytics, 2016, 1(2): 139-161. doi: 10.3934/bdia.2016001
    [7] Jianguo Dai, Wenxue Huang, Yuanyi Pan . A category-based probabilistic approach to feature selection. Big Data and Information Analytics, 2018, 3(1): 14-21. doi: 10.3934/bdia.2017020
    [8] Elnaz Delpisheh, Aijun An, Heidar Davoudi, Emad Gohari Boroujerdi . Time aware topic based recommender System. Big Data and Information Analytics, 2016, 1(2): 261-274. doi: 10.3934/bdia.2016008
    [9] Jian-Bing Zhang, Yi-Xin Sun, De-Chuan Zhan . Multiple-instance learning for text categorization based on semantic representation. Big Data and Information Analytics, 2017, 2(1): 69-75. doi: 10.3934/bdia.2017009
    [10] Wenxue Huang, Xiaofeng Li, Yuanyi Pan . Increase statistical reliability without losing predictive power by merging classes and adding variables. Big Data and Information Analytics, 2016, 1(4): 341-348. doi: 10.3934/bdia.2016014
  • Given a data set with one categorical response variable and multiple categorical or continuous explanatory variables, it is required in some applications to discretize the continuous explanatory ones. A proper supervised discretization usually achieves a better result than the unsupervised ones. Rather than individually doing so as recently proposed by Huang, Pan and Wu in[12, 13], we suggest a forward supervised discretization algorithm to capture a higher association from the multiple explanatory variables to the response variable. Experiments with the GK-tau and the GK-lambda are presented to support the statement.


    1. Introduction

    In data analysis and mining, discretization algorithms are to turn continuous variables into certain levels. For instance, income, as a continuous variable, needs to be leveled as low, median or high for executable user profiling or for descriptive analysis. Many data mining technologies or methods also require categorical explanatory variables. For example, Bayesian classification[21] assumes that the explanatory variables are all categorical or discrete; decision trees[22] consists of tree node describing the condition that leads to the next node. The variables involved in each node have to be either categorical or described as a combination of intervals. One may see many continuous variables in many real data sets such as asset, income, debt, age, numerical measure of risk, etc.

    A natural method of grouping distinct values in a continuous variable is to seek data-based cutting points that cut the whole range of data into intervals. There are two ways to identify the intervals: with or without a given response variable. An algorithm of discretizing continuous variables with a response variable (with a given criterion or objective function) is called supervised discretization; while the other (with no link to the response variable) is called unsupervised discretization [5]. The latter one usually is used in industrial data mining project to deal with multiple response variables at the same time for convenience, unification or for the purpose of saving production cost. One may use a normal distribution based unsupervised discretization to process the macro numerical consuming data because of the central limit theorem. Other unsupervised discretization methods include, but not limit to equal frequency intervals, equal width intervals[5] or more sophisticated ones associated with certain criterion measures (or objective functions), say, the information theoretic entropy[4,6]. The idea is to minimize or maximize the measures in each interval by adjusting the boundaries. Various clustering algorithms can also be used to accomplish an unsupervised discretization[7] to multi-dimensional cases, which compartmentalize a continuous multidimensional space into a given finite number of parts.

    In contrast to the unsupervised one, supervised discretization algorithms aim to seek the boundaries by optimizing the intervals' coherence[16] associated with a target variable with an optimization criterion. In other words, an evaluation function is usually applied to measure the discretization algorithm's quality. Typical measures include conditional entropy, conditional Gini concentration or Chi-square. The Chi-square based methods include ChiMerge [15], Chi2 [17], Khiops [1], etc. One can refer to [2,3,10,20] for the detailed discussion regarding the (conditional) entropy based methods. The conditional Gini concentration based methods can be found in [12,13]. The choice of discretization method depends on the time computing complexity, the greediness for the accuracy, the interpretability [16] and/or how easy the result can be executed. Usually the unsupervised discretization methods are faster than the supervised ones but result in less accuracy in predicting the target variable. Dougherty et al.[5] prove that (conditional) entropy-based discretization methods perform quite well overall regarding the proportional association. Holte showed in [10] that even a one-dimensional supervised discretization system could yield similar classification results to a multiple dimensional ones if the system is carefully tuned.

    Rather than using (conditional) entropy-based discretization method, Huang et al. propose two alternates, proportional and modal, association based independently (or individually) supervised discretization in [12,13]. Beside the expected higher association with the responses and with limited higher computational complexity than an unsupervised one, they also argue that their measures are more interpretable than the conditional entropy ones.

    Suppose we work with a categorical response variable and a number of explanatory continuous variables. We propose a forward supervised discretization algorithm to capture a better association with the response variable compared to the independently supervised discretization algorithm proposed by Huang et al. Experiments in the latter part of this article show remarkable improvements, with acceptable increase of computationally complexity, regarding the same association proposed in [12] and [13].

    This article is organized as follows. Section 2 reassembles the concept of categorical variable association measures, especially, the GK-tau and GK-lambda. We also briefly recall Huang et al.'s independently supervised discretization algorithm in this section. The forward supervised discretization algorithm is presented in Section 3. Although the algorithms are introduced by GK-tau, it can be replaced by any other association measure including GK-lambda.

    Section 4 show the supportive evidences by experiments with the GK-tau and GK-lambda. The major issues are summarized and commented in the last section.


    2. Categorical variable association measures and independently supervised discretization

    Let X and Y be categorical variables with domains Dmn(X)={1,2,,nX} and Dmn(Y)={1,2,,nY}, respectively, where X is a explanatory variable and Y is a response (target) variable.

    According to [18,p.70], an association measure (or degree) of Y on X is, in general, of the form

    δY|X=V(Y)EX{EYV(Y|X)]}V(Y),

    where V is a given variance and E is the corresponding expectation. The choosing of a variance measure depends on the objective of the data analysis, the predictive model to be used, or even on the analyst's preference: the Gini, entropy and Chi-square are typical preferences. In this article, we choose the proportional and the modal prediction oriented variances, the GK-tau and GK-lambda, as in ([11,12,13,8,18], for their statistical interpretability.

    The proportional association degree of Y on X, denoted by τY|X, referred also to as the GK-tau, is given by

    τY|X=nYj=1nXi=1p(X=i,Y=j)2p(X=i)nYj=1p(Y=j)21nYj=1p(Y=j)2,

    where p() is the probability of an event. To help the reader better understand τY|X, we recall

    (1) ωY|X=nYj=1nXi=1p(X=i,Y=j)2p(X=i)=nYj=1nXi=1p(X=i,Y=j)p(Y=j|X=i),

    (2) E(p(Y))=nYj=1p(Y=j)2, and

    (3) Gini(Y)=1E(p(Y)),

    where ωY|X is the expected accuracy rate for predicting Y given X; E(p(Y)) is the expected accuracy rate for predicting Y by its own distribution; τY|X is the error reduction rate of predicting Y given X over predicting Y on its own. Please refer to[12] and[8,18] for more detailed discussions.

    The optimal (or modal) association degree of Y on X, denoted by λY|X is given by

    λY|X=nXj=1ρjmρ.m1ρ.m.

    where ρ(.) is probability of an event, and

    ρjm=maxj{1,2,,nY}ρ(X=i;Y=j),

    where ρ.m=maxj{1,2,,nY}ρ(Y=j) and ρ.m is the theoretical prediction accuracy rate for modally predicting target variable Y without information of variable X; ρjm is the accuracy rate for modally predicting target variable Y based on the information of variable X. Obviously, λY|X is the error reduction rate using information of X over using only the marginal information of Y by mode. [8,p.71] has detailed discussion in λY|X.

    Recall [12] again that for a continuous explanatory variable X and categorical nominal variable Y with nY different values from a given data set, assume Bk={b1,b2,,bk} is a set of different values, where b1<b2<<bk, and Bk can cut continue variable X into k+1 intervals: (,b1],(b1,b2],,(bk,). The association measure τY|X for a given cutting point set Bk can be defined as:

    τY|X(Bk)=nYj=1k+1i=1p(bi1<Xbi,Y=j)2p(bi1<Xbi)nYj=1p(Y=j)21nYj=1p(Y=j)2,

    where b0= and bk+1=.

    Huang et al proposed an optimal splitting searching scheme in [12] to find these cutting points. This scheme can be briefly described as follows:

    Step1. The first cutting point, denoted r1(X), can be found by

    r1(X)=argmaxmin(X)<r<max(X)τY|((,r],(r,));

    Step2. The second cutting point, denoted r2(X), can be found by

    r2(X)=argmaxmin(X)<r<max(X)r1(X)τY|((,min(r1,r)],(min(r1,r),max(r1,r)],(max(r,r1),))

    Step3. Continue the steps until a predefined number, say m, of intervals are found.

    Please note that the number of values for a given variable in a given data set is always finite even this variable is defined as continuous. Thus not only the aforementioned min(X), max(X), but also the number of search steps are all finite. Besides, one can also discretize the continuous variable in a unsupervised manner to speed up the search.

    Please also note that the previous searching schema can also applies to the case of Gkλ where the association is as follows.

    λ(Y|X(BK))=k+1i=1maxj{1,2,,nY}p(bi1<Xbi,Y=j)maxj{1,2,,nY}{p(Y=j)}1maxj{1,2,,nY}{p(Y=j)},

    where b0= and bk+1=.


    3. Forward supervised discretization

    For a data set with n continuous independent variables, X1,X2,,Xn, we first discretize them by independently supervised discretization approach introduced in the previous section. The discretized independent variables are denoted as idXi, i=1,2,,n. We then identify the leading one, denoted as idXi0, that brings in the maximum association to the target, i.e., idXi0=argmaxi=1,2,,nτY|idXi.

    We assume without loss of generality that i0=1. Then idX1 is the base (categorical) variable for our forward supervised discretization, which means the subsequent discretization for any other Xi, i1 is based on idX1. The searching for cutting points in Xi is similar to that in Section 2 as follows.

    Step1. The first cutting point, denoted fd1r1(Xi), is determined by

    fd1r1(Xi)=argmaxmin(Xi)<r<max(Xi)τY|(idX1,((,r],(r,)));

    Step2. the second cutting point, denoted fd1r2(Xi), is determined by

    fd1r2(Xi)=argmaxmin(Xi)<r<max(Xi)fd1r1(Xi)τY|(idX1,fd1r2(Xi)),

    where

    fd1r2(Xi)={(,min(fd1r1,r)],(min(fd1r1,r),max(fd1r1,r)],(max(fd1r1,r),);

    Step3. continue the process in this fashion until a predefined number, say m, of intervals are found.

    When all variables are discretized by this procedure, denoted as fd1(Xi), i=2,,n, we have two results for each Xi: one from independently supervised discretization and one from forward supervised discretization. One might ask which one brings in higher association when working together with idX1. We will show by experiments in the next section that the latter wins.

    We further assuming without loss of generality that fd1(X2)=argmaxi=1,2,,nτY|(idX1,fd1(Xi). We can then continue to discretize other variables following the similar pattern above by idX1 and fd1(X2).

    Step1. The first cutting point, denoted fd2r1(Xi), is determined by

    fd2r1(Xi)=argmaxmin(Xi)<r<max(Xi)τY|((idX1,fd1(X2)),((,r],(r,));

    Step2. The second cutting point, denoted fd2r2(Xi), is determined by

    fd2r2(Xi)=argmaxmin(Xi)<r<max(Xi)fd2r1(Xi)τY|((idX1,fd1(X2)),fd2r2(Xi)),

    where

    fd2r2(Xi)={(,min(fd2r1,r)],(min(fd2r1,r),max(fd2r1,r)],(max(fd2r1,r),).

    Step3. Continue the process until a predefined number, say m, of intervals are found.

    We admit that the forward discretizing scheme in this article is a variation of stepwise feature selection procedure[9]. When to stop the discretizing loop depends on the condition to stop searching the next variable. The conditions include, but not limited to, reaching the maximum joint association, or reaching the predefined maximum number of variables.

    Naturally, the computational expense for the forward supervised discretization is significantly higher than the individual ones. But the difference is still acceptable since our forward one is based on individual one; and the individual one finally chooses very few predictors. Given that this article is to recommend an alternative discretization procedure that can increase the association from the independent variables to the target, we believe it is worth the cost.


    4. Empirical experiment and discussion

    Experiments. The purpose of this article's experiments is to show how the forward supervised discretization method improves the variable association in the multivariate case for both the GK-tau and the GK-lambda. Not only the associations but also the independent variables' domain size are evaluated under different circumstances to demonstrate the approximate reliability in statistical sense for each chosen variable set. In general, a variable set with a smaller domain size has higher confidence power. When two variable sets have the same association, the one with smaller domain size is preferred in most feature selection methods.

    The data set in our experiment is The Survey of Family Expenditure conducted by Statistic Canada in 1996 (Famex96)[23]. It has 10,417 rows with over 200 continuous and categorical variables. We use four of them as the continuous independent variables. They are Income Before Taxes(Inc-btax), Total Expenditure(Tot-expn), Income before Taxes(Hh-incbt), Paper, Plastic And Foil Household Supplies(Hh-suply) and Age(age). Two target variables are manually picked from the data set to exemplify our statement. The first target variable is Class Of Tenure (Tenure) with four classes(1-4). The second target variable is the educational level(Edn) with six cases(1-6). Both variables are ordinal in nature, but we treat them as nominal for the purpose of simplicity. The maximal number of intervals for each continuous independent variable is set as m=3 for the purpose of simplicity again.

    Since the approach in this article is inspired and based by the independently supervised discretization algorithm, we are going to compare them in different scenarios in both experiments. Please note that ω, rather than τ, is chosen in the first experiment to explain the reason to select various models because not only because they are mathematically equivalent, but also because we believe ω has better interpretability. The same reason goes with the selection of ϕ over λ in the second experiment.


    4.1. GKτ case by Tenure

    After the independent supervised discretization by GKτ, we get the independent variables' associations as in Table 1.

    Table 1. ω and τ: the first round of discretizations.
    variableBoundary1Boundary2 ωY|X τY|XDmn(Y,X)
    Inc-btax 29875 44813 0.34560.044312
    Tot-expn 21274 41948 0.38020.094612
    Hh-incbt19655 35337 0.38480.069412
    Hh-suply 185 371 0.33760.032412
    Age 34 54 0.38960.108412
     | Show Table
    DownLoad: CSV

    Thus the best variable after the independent supervised discretization is Age with ω=0.3896. Denote each independently discretized variable as idInc-btax, idTot-expn, idHh-incbt, idHh-suply and idAge respectively, ωs for the 2-variable groups, (idAge, idInc-btax), (idAge, idTot-expn), (idAge, idHh-incbt) and (idAge, idHh-suply) are 0.4134, 0.4523, 0.4527, 0.4082 respectively. The best group then is (idAge, idHh-incbt) with ω=0.4527. Going on with the same process gives us the best 3-variable group as (idAge, idHh-incbt, idTotexpn) with ω=0.4661. Correspondingly, the domain size for the best 2-variable groups and 3-variable groups are 36 and 108.

    The difference between the independently supervised discretization and the forward supervised discretization begins at the 2-variable groups. Using idAge as the first one for the subsequent forward supervised discretization process since it has the highest ω, we can discretize the other explanatory variables by the procedure introduced in the previous section. The discretized variables are denoted as fd1Xi where Xi are one of Inc-btax, Tot-expn, Hh-incbt and Hh-suply. Table 2 shows the detailed discretization result.

    Table 2. the ω and τ: the second round of discretization.
    XBndry1Bndry2 ωY|idX1,fd1X τY|idX1,fd1X |(Y,idX1,fd1X)|
    fd1Inc-btax 14938 29875 0.41350.143335
    fd1Tot-expn 31611 52285 0.45060.197535
    fd1Hh-incbt19887 35649 0.45870.209333
    fd1Hh-suply93 185 0.40960.137735
     | Show Table
    DownLoad: CSV

    Table 2 shows the best 2-variable group as (idAge, fd1Hh-incbt) with ω=0.4587 and Dmn(Y,idX1,fd1X2)=35.Since ω(Y|(idX1,fd1X2))ω(Y|(idX1,id1X2)) and Dmn(Y,idX1,fd1X2)Domain(Y,idX1,id1X2), the forward procedure is better than the individual one by both indicators.

    Keep going with the same process, we have the discretization result for the third variable, denoted by fd2X as Table 3

    Table 3. ω and τ: the third round of discretization.
    XBndry1Bndry2 ωY|idX1,fd1X2,fd2X τY|idX1,fd1X2,fd2X |(Y,idX1,fd1X,fd2X)|
    fd2Incbtax 21249 42499 0.48180.243072
    fd2Totexpn 15305 44715 0.47760.236989
    fd2Hhsuply132 268 0.47210.2288107
     | Show Table
    DownLoad: CSV

    The best 3-variable group then is (Age, Hh-incbt, Inc-btax) with ω=0.4818 and its sample size is 72. Given that ω(Y|idX1,fd1X2,fd2X3)ω(Y|id(X1,X2,X3)) and Dmn(Y,idX1,fd1X2,fd2X3)Dmn(Y,id(X1,X2,X3)), the forward approach is still better than the individual one at the 3-variable level.


    4.2. GKλ case by Edn

    After individually discretized by GKλ with respect to Edn, we have the following initial result as follow Table 4.

    Table 4. The ϕ and λ:: the first round of discretization.
    XBndry1Bndry2 ϕY|X λY|X |(Y,X)|
    IncBtax 44813 59750 0.43310.066718
    Tot-Expn 83297 93634 0.42130.047316
    HhIncbt82389 98073 0.42280.049717
    Hh-Supply 2039 2966 0.40310.027217
     | Show Table
    DownLoad: CSV

    The best variable regarding λ is then idInc-Btax. Calculation also shows that the next 2-variable groups (idInc-Btax, idTot-Expn), (idInc-Btax, idHh-Incbt), (idInc-Btax, idHh-Suply) have a list of ϕ as, respectively, 0.4346, 0.4339, 0.4335. It gives us the best 2-variable group as (idInc-Btax, idTot-Expn). The remaining 3-variable groups, (idInc-Btax, idTot-Expn, idHh-Incbt) and (idInc-Btax, idTot-Expn, idHh-Suply) show ϕ as 0.4369 and 0.4347 respectively which gives us the better one as (idInc-Btax, idTot-Expn, idHh-Incbt). Let Inc-Btax, Tot-Expn, Hh-Incbt be denoted as X1, X2 and X3, respectively. We also find out that ω(Y|idX1,idX2)=0.4346, Dmn(Y,idX1,idX2)=46; ω(Y|idX1,idX2,idX3)=0.4369, and Dmn(Y,idX1,idX2,idX3)=132.

    Using idInc-Btax as the leading variable in the subsequent forward supervised discretization process, we have the second discretization to other independent variables, denoted as (fd1Inc-btax, fd1Tot-expn, fd1Hh-suply) or fd1Xi as Table 5:

    Table 5. The ϕ and λ: the second round of discretization.
    VariableBndry1Bndry2 ϕY|idX1,fd1XiλY|idX1,fd1Xi |(Y,idX1,fd1Xi)|
    fd1Tot-Expn 113774 144803 0.43530.070240
    fd1Hh-Incbt107890 184954 0.43580.071241
    fd1Hh-Suply 834 927 0.43330.067043
     | Show Table
    DownLoad: CSV

    From Table 5, we see that (Inc-Btax, fd1Hh-Incbt) has the highest ϕ as 0.4358 with domain size of 41. Hence

    ϕY|(idX1,fd1X2)ϕY|(idX1,idX2)

    with the additional advantage that

    |Dmn(Y,idX1,fd1X2)|<|Dmn(Y,idX1,idX2)|.

    Table 6 shows the result of discretizing the third variable based on (idInc-Btax, fd1Hh-incbt) from the previous step.

    Table 6. the ϕ and λ result for 2 forwardly supervisedly discretized variables.
    XBndry1Bndry2ϕY|idX1,fd1X2,fd2XλY|idX1,fd1X2,fd2X|(Y,idX1,fd2X1,fd2X)|
    fd2Tot-Expn 47349 114874 0.4380.0747112
    fd2Hh-Supply 233 838 0.43750.0739126
     | Show Table
    DownLoad: CSV

    The 3-variable group with the highest accurate rate then goes to (idInc-Btax, fd1Hh-Incbt, fd2Tot-Expn). The corresponding

    ϕY|(idX1,fd1X2,fd2X3=0.438ϕY|(idX1,idX2,idX3=0.4369

    while

    |Dmn(Y,idX1,fd1X2,fd2X3)|=112<|Dmn(Y,idX1,idX2,idX3)|=132.

    We may continue if the number of continuous independent variables is greater than 3 and that the sample size is large enough to ensure the reliability of information (or the confidence power of data). Till then both experiments show show improved performance by the forward supervised discretization than the individual one.


    5. Discussion and future work

    In this article, we propose a categorical variable association, e.g., the GK-tau or the GK-lambda, based forward supervised discretization method for multi-dimensional data set. This method is inspired and based on an individually supervised discretization proposed in [12,13]. We demonstrate the new approach's advantage by two experiments. One is based on GK-tau or and another is based on GK-lambda. The experiments also have different target variables to show our approach's robustness. Admittedly, the new approach take more computational time. But we believe the cost is acceptable given the the improved performance including association.

    Although the individual and the forward are applied to one single variable while the latter uses the information from the variable that are previously discretized by the same approach, it is natural to extend the case to compartmentalizing a multi-dimensional space using the same idea. A popular compartmentalization technology is clustering. Another interesting research is to find out the exact computational cost for the new approach.


    [1] [ M. Boulle, Khiops:A statistical discretization method of continuous attributes, Machine Learning, 55(2004), 53-69.
    [2] [ J. Catlett, On changing continuous attributes into ordered discrete attributes, In:Machine LearningEWSL-91, 482(1991), 164-178.
    [3] [ D. Chiu, B. Cheung and A. Wong, Information synthesis based on hierarchical maximum entropy discretization, Journal of Experimental and Theoretical Artificial Intelligence, 2(1989), 117-129.
    [4] [ M. Chmielewski and J. Grzymala-Busse, Global discretization of continuous attributes as preprocessing for machine learning, International Journal of Approximate Reasoning, 15(1996), 319-331.
    [5] [ J. Dougherty, R. Kohavi and M. Sahami, Supervised and unsupervised discretization of continuous features, In Machine learning-International Workshop. Morgan Kaufmann Publishers, 2(1995), 194-202.
    [6] [ U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, Proceedings of the International Joint Conference on Uncertainty in AI, 2(1993), 1022-1027.
    [7] [ G. Gan, C. Ma and J. Wu, Data clustering:Theory, algorithms, and applications(ASA-SIAM series on statistics and applied probability), Society for Industrial and Applied Mathematics, 20(2007), xxii+466 pp.
    [8] [ L. Goodman and W. Kruskal, Measures of association for cross classifications, Journal of the American Statistical Association, 49(1954), 732-764.
    [9] [ I. Guyon and A. Elisseeff, An Introduction to Variable and Feature Selection, Applied Physics Letters, 3(2002), 1157-1182.
    [10] [ R. Holte, Very sim1ple classification rules perform well on most commonly used datasets, Machine Learning, 11(1993), 63-90.
    [11] [ W. Huang and Y. Pan, On balalncing between optimal and proportional predictions, Big Data and Information Analytics, 1(2016), 129-137.
    [12] [ W. Huang, Y. Pan and J. Wu, Supervised discretization with GK-τ, In Procedia Computer Science, 17(2013), 114-120.
    [13] [ W. Huang, Y. Pan and J. Wu, Supervised discretization with GK-λ, Procedia Computer Science, 30(2014), 75-80.
    [14] [ W. Huang, Y. Shi and X. Wang, A nomminal association matrix with feature selection for categorical data, Communications in Statistics-Theory and Methods, to appear.
    [15] [ R. Kerber, Chimerge:Discretization of numeric attributes, In Proceedings of the tenth national conference on Artificial intelligence.AAAI Press, 1994, 123-128.
    [16] [ S. Kotsiantis and D. Kanellopoulos, Discretization techniques:A recent survey, GESTS International Transactions on Computer Science and Engineering, 32(2006), 47-58.
    [17] [ H. Liu and R. Setiono, Chi2:Feature selection and discretization of numeric attributes, In:Proceedings of the Seventh International Conference on Tools with Artificial Intelligence, 55(1995), 388-391.
    [18] [ C. Lloyd, Statistical Analysis with Missing Data, John Wiley & Sons, Inc. 1987, New York, NY, USA.
    [19] [ J. MacQueen, Some methods for classification and analysis of multivariate observations, Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1(1967), 281-297.
    [20] [ D. Olson and Y. Shi, Introduction to business data mining, Knowledge and information systems, 2007, McGraw-Hill/Irwin.
    [21] [ I. Rish, An empirical study of the naive bayes classifier, IJCAI 2001 workshop on empirical methods in artificial intelligence, 2001, 41-46.
    [22] [ S. Safavian and D. Landgrebe, A survey of decision tree classifier methodology, IEEE Transactions on Systems, Man and Cybernetics, 21(1991), 660-674.
    [23] [ STATCAN, Survey of Family Expenditures-1996.
    [24] [ K. Ting, Discretization of Continuous-Valued Attributes and Instance-Based Learning, Basser Department of Computer Science,University of Sydney, 1994.
  • This article has been cited by:

    1. Haddouchi Maissae, Berrado Abdelaziz, A novel approach for discretizing continuous attributes based on tree ensemble and moment matching optimization, 2022, 14, 2364-415X, 45, 10.1007/s41060-022-00316-1
  • Reader Comments
  • © 2016 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2948) PDF downloads(498) Cited by(1)

Article outline

Figures and Tables

Tables(6)

Other Articles By Authors

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog