Y | X11 | ⋯ | X1p1 | ⋯ | Xg1 | ⋯ | Xgpg | ⋯ | XG1 | ⋯ | XGpG |
Y1 | x1,11 | ⋯ | x1,1p1 | ⋯ | x1,g1 | ⋯ | x1,gpg | ⋯ | x1,G1 | ⋯ | x1,GpG |
⋮ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ |
Yn | xn,11 | ⋯ | xn,1p1 | ⋯ | xn,g1 | ⋯ | xn,gpg | ⋯ | xn,G1 | ⋯ | xn,GpG |
Citation: Lulu Ren, JinRong Wang, Michal Fečkan. Periodic mild solutions of impulsive fractional evolution equations[J]. AIMS Mathematics, 2020, 5(1): 497-506. doi: 10.3934/math.2020033
[1] | Zainab Alsheekhhussain, Ahmed Gamal Ibrahim, Rabie A. Ramadan . Existence of $ S $-asymptotically $ \omega $-periodic solutions for non-instantaneous impulsive semilinear differential equations and inclusions of fractional order $ 1 < \alpha < 2 $. AIMS Mathematics, 2023, 8(1): 76-101. doi: 10.3934/math.2023004 |
[2] | Dongdong Gao, Daipeng Kuang, Jianli Li . Some results on the existence and stability of impulsive delayed stochastic differential equations with Poisson jumps. AIMS Mathematics, 2023, 8(7): 15269-15284. doi: 10.3934/math.2023780 |
[3] | Ramkumar Kasinathan, Ravikumar Kasinathan, Dumitru Baleanu, Anguraj Annamalai . Well posedness of second-order impulsive fractional neutral stochastic differential equations. AIMS Mathematics, 2021, 6(9): 9222-9235. doi: 10.3934/math.2021536 |
[4] | Huanhuan Zhang, Jia Mu . Periodic problem for non-instantaneous impulsive partial differential equations. AIMS Mathematics, 2022, 7(3): 3345-3359. doi: 10.3934/math.2022186 |
[5] | Ahmed Salem, Kholoud N. Alharbi . Fractional infinite time-delay evolution equations with non-instantaneous impulsive. AIMS Mathematics, 2023, 8(6): 12943-12963. doi: 10.3934/math.2023652 |
[6] | Mohamed Adel, M. Elsaid Ramadan, Hijaz Ahmad, Thongchai Botmart . Sobolev-type nonlinear Hilfer fractional stochastic differential equations with noninstantaneous impulsive. AIMS Mathematics, 2022, 7(11): 20105-20125. doi: 10.3934/math.20221100 |
[7] | Marimuthu Mohan Raja, Velusamy Vijayakumar, Anurag Shukla, Kottakkaran Sooppy Nisar, Wedad Albalawi, Abdel-Haleem Abdel-Aty . A new discussion concerning to exact controllability for fractional mixed Volterra-Fredholm integrodifferential equations of order $ {r} \in (1, 2) $ with impulses. AIMS Mathematics, 2023, 8(5): 10802-10821. doi: 10.3934/math.2023548 |
[8] | Kanagaraj Muthuselvan, Baskar Sundaravadivoo, Kottakkaran Sooppy Nisar, Suliman Alsaeed . Discussion on iterative process of nonlocal controllability exploration for Hilfer neutral impulsive fractional integro-differential equation. AIMS Mathematics, 2023, 8(7): 16846-16863. doi: 10.3934/math.2023861 |
[9] | M. Manjula, K. Kaliraj, Thongchai Botmart, Kottakkaran Sooppy Nisar, C. Ravichandran . Existence, uniqueness and approximation of nonlocal fractional differential equation of sobolev type with impulses. AIMS Mathematics, 2023, 8(2): 4645-4665. doi: 10.3934/math.2023229 |
[10] | Dumitru Baleanu, Rabha W. Ibrahim . Optical applications of a generalized fractional integro-differential equation with periodicity. AIMS Mathematics, 2023, 8(5): 11953-11972. doi: 10.3934/math.2023604 |
Ultrahigh-dimensional data are commonly available for a wide range of scientific research and applications. Feature screening plays an essential role in ultrahigh-dimensional data, where Fan and Lv [5] first proposed sure independence screening (SIS) in their seminal paper. For linear regressions, they showed that the approach based on Pearson correlation learning possesses a screening property. That is, even if the number of predictors P can grow much faster than the number of observations n with logP=O(nα) for some α∈(0,12), all relevant predictors can be selected with a probability tending to one [6].
To address ultrahigh-dimensional feature screening in the classification problem, Mai and Zou [11] applied a Kolmogorov filter to ultrahigh-dimensional binary classification. Cui et al. [4] proposed a screening procedure using empirical conditional distribution functions. The proposed screening methods assume that the types of data are continuous. Assume that the types of data are continuous. For categorical covariates, Huang et al. [8] constructed a model-free discrete feature screening method based on Pearson Chi-square statistics and showed its screening property fulfilling Fan et al. [6] when all covariates were binary. Ni and Fang [12] proposed a model-free feature screening procedure based on information entropy theory for multi-class classification. Ni et al. [13] further proposed a feature screening procedure based on weighting the Adjusted Pearson Chi-square for multi-class classification. Sheng and Wang [17] proposed a new model-free feature screening method based on the classification accuracy of marginal classifiers for ultrahigh-dimensional classification. However, some covariates existed in the groups, especially discrete and categorical covariates that showed microarrays, genomics, brain images and quantitative measurements. A fair number of grouped variable selection methods arise from individual variable selection and yield a sparse solution at the group level, or even at the within-group level. Refer to group LASSO [22], group SCAD [21], group MCP [2], group hierarchical LASSO [23], group bridge [9] and group exponential LASSO [1]. When the regularization parameter is set for non-sparse estimation, some grouped variable selection algorithms may fail to converge, causing non-identifiability problems and near-singularity problems. Even if the algorithm converges in the setting of a large group and small sample n, the estimated coefficients are not likely to be globally optimal solutions. Therefore, to reduce the number of groups before selecting important groups and variables within these groups, there is a need for new screening methods. For ultrahigh-dimensional data with grouping structures, Niu et al. [15] applied working independence in linear models to propose a group-screening approach. Song and Xie [18] further used F-test statistics to construct a group screening approach that improved marginal methods by reducing the burden of multiple testing and aggregating individual effects. With regard to ultrahigh-dimensional group data in the linear model, Qiu and Ahn [16] proposed group sure independence screening (gSIS), group high dimensional ordinary least-squares projector (gHOLP) and group wise adjusted R-squares screening (gAR2). He and Deng [7] applied joint information entropy to screen for important grouped covariates.
In this study, we propose a model-free group feature screening method for ultrahigh-dimensional multi-classification of categorical. Our proposed group screening method is based on the Gini impurity to evaluate the predictive power of grouped covariates. The Gini impurity is a non-purity attribute splitting index, which was proposed by Breiman et al. [3] and has been widely used in decision tree algorithms, such as CART and SPRINT. Regarding categorical covariate screening, we can apply the index of purity gain, which is the same as the information gain [12]. As in Ni and Fang [12], continuous covariates can be sliced using standard normal quantiles. The proposed grouped feature screening procedure is based on the purity gain, which is referred to as GP-SIS. Theoretically, the GP-SIS is rigorously proven to enjoy Fan and Lv [5] proposed a sure screening property that ensures that all important features can be obtained. Practically, as shown by the simulation results, compared with the existing group feature screening method and single covariate feature screening, GP-SIS has a better performance.
The remainder of this paper is organized as follows. Section 2 describes the proposed GP-SIS method in detail. In Section 3, the screening property is established. In Section 4, numerical simulations and an example of real data analysis are presented to assess the performance of the proposed method. Some concluding remarks are given in Section 5, and all proofs are provided in the Appendix.
We first introduce the Gini impurity and purity gain and then propose a screening procedure based on purity gain.
Each grouped covariate can be regarded as a whole. Suppose that Y is a categorical response, and covariate matrix X is a multivariate covariate matrix of n×P dimension with G grouped covariates, which can be represented in Table 1.
Y | X11 | ⋯ | X1p1 | ⋯ | Xg1 | ⋯ | Xgpg | ⋯ | XG1 | ⋯ | XGpG |
Y1 | x1,11 | ⋯ | x1,1p1 | ⋯ | x1,g1 | ⋯ | x1,gpg | ⋯ | x1,G1 | ⋯ | x1,GpG |
⋮ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ |
Yn | xn,11 | ⋯ | xn,1p1 | ⋯ | xn,g1 | ⋯ | xn,gpg | ⋯ | xn,G1 | ⋯ | xn,GpG |
Here, Xg=(Xg1,Xg2,⋯,Xgpg) represents the g-th group covariate, pg represents the dimension of the covariates in the g-th group covariates, and P=∑Gg=1pg. To introduce the Gini impurity and purity gain, assume that all the covariate components of the covariate matrix X are classified with J categorizes {1,⋯,J}. The values of any element in Xg∈{1,⋯,J}, Jpg combinations were formed. Jg represents the last combinations between covariate categories in the g-th group covariate matrix, Jg=(jpg,jpg,⋯,jpg). Here, jg=(j1,⋯,jpg) represents the indicator variable in the combination between covariate categories in the g-th group covariate matrix, and j1 represents the first covariate category combination.
Let pr=P(Y=r) represent the probability function of a response variable, wjg=w(j1,⋯,jpg)=P(Xg1=j1,⋯,Xgpg=jpg) represent the probability function of group covariate, and pjgr=p(j1,⋯,jpg)r=P(Y=r|Xg1=j1,⋯,Xgpg=jpg) represent the probability function of response variables under the condition of group covariates, where g∈{1,⋯,G},(j1,⋯,jpg)∈{(1,1,⋯,1),(2,1,⋯,1)⋯,(J,J,⋯,J)},r∈{1,⋯,R}. The marginal Gini impurity of Y is defined as
Gini(Y)=1−R∑r=1p2r. | (2.1) |
Conditional Gini impurity is defined as
Gini(Y|Xg)=Jg∑jg=c(1,1,⋯,1)wjg(1−R∑r=1p2jgr). | (2.2) |
Similar to the information gain, the purity gain is defined as
GP(Y|Xg)=Gini(Y)−Gini(Y|Xg)=1−R∑r=1p2r−Jg∑jg=c(1,1,⋯,1)wjg(1−R∑r=1p2jgr). | (2.3) |
In Eq (2.1), Gini(Y) is non-negative and acquires its maximum 1−1R if and only if p1=⋯=pR=1R by Jensen's inequality. The Gini(Y|Xg) in Eq (2.2) is the conditional Gini impurity of Y given Xg1=j1,⋯,Xgpg=jpg. Further support can be provided by the following proposition.
Proposition 2.1. When Xg is a categorical covariable, we obtain GP(Y|Xg)≥0, and Xg and Y are independent if and only if GP(Y|Xg)=0.
For continuous Xg, the conditional Gini impurity cannot be directly calculated, and the purity gain by slicing X into several categories. For a fixed integer J≥2, let q(j) be the j/J-th percentile of X, j=1,⋯,J−1, q(0)=−∞ and q(J)=+∞. Replacing wjg and pjgr in Eq (2.3), respectively, by
wjg=w(j1,⋯,jpg)=P(Xg1∈(qg1,(j−1),qg1,(j)],⋯,Xgpg∈(qgpg,(j−1),qgpg,(j)]), | (2.4) |
pjgr=p(j1,⋯,jpg)r=P(Y=r|Xg1∈(qg1,(j−1),qg1,(j)],⋯,Xgpg∈(qgpg,(j−1),qgpg,(j)]). | (2.5) |
We define conditional Gini impurity based on continuous covariates:
GiniJ(Y|Xg)=Jg∑jg=c(1,1,⋯,1)wjg(1−R∑r=1p2jgr), | (2.6) |
GPJ(Y|Xg)=(1−R∑r=1p2r)−GiniJ(Y|Xg). | (2.7) |
Proposition 2.2. When Xg is a continuous covariable, we obtain GPJ(Y|Xg)≥0, and Xg and Y are independent if and only if GPJ(Y|Xg)=0.
First, we select a medium-scale simplified model that can almost fully contain D, where D=\{g:F(Y|x) functionally depends on Xg for some Y=r\}, using an adjusted purity gain index for each pair (Y,Xg) as follows:
eg=[(1−∑Rr=1p2r)−∑Jgjg=c(1,1,⋯,1)wjg(1−∑Rr=1p2jgr)]logNg. | (2.8) |
Here, pr=P(Y=r), wjg=w(j1,⋯,jpg)=P(Xg1=j1,⋯,Xgpg=jpg) when Xg is a categorical group, Ng represents the number of group categories of Xg, and pjgr=p(j1,⋯,jpg)r=P(Y=r|Xg1=j1,⋯,Xgpg=jpg).
When Xg is defined as continuous group covariates, pjgr=P(Y=r|Xg1∈(qg1,(j−1),qg1,(j)],⋯,Xgpg∈(qgpg,(j−1),qgpg,(j)]), where qg1,j represents j/J percentile of Xg1⋅Jg=JPg, and J represents the number of slices applied to Xg. In this case, Ng=JPg.
There may be more categories of group covariates associated with larger purity gain in the original definition of Eq (2.3), regardless of whether the group covariates are important, especially when the number of categories involved in each group covariate is different. Therefore, Ni and Fang [12] used logJk to construct an information gain ratio to solve this problem, where each category of Xk is the same. Similarly, when each category of Xg is the same, for Eq (2.8), we apply the logNg to build an adjusted purity gain index to address the problem, which is also applied to continuous Xg. However, when each category of Xg is different, 1−∑Jgjg=c(1,1,⋯,1)w2jg is defined as an adjustment factor, motivated by the split of Xg into several categories via the Decision Tree algorithm.
For group sample data {xi,g1,⋯,xi,gpg,y}, i=1,⋯,n, eg can be easily estimated by
ˆeg=[(1−∑Rr=1ˆp2r)−∑Jgjg=c(1,1,⋯,1) ˆwjg(1−∑Rr=1ˆp2jgr)]logNg. | (2.9) |
When Xg is categorical,
ˆwjg=1nn∑i=1I{xi,g1=j1,⋯,xi,gpg=jpg}, | (2.10) |
ˆpjgr=∑ni=1I{yi=r,xi,g1=j1,⋯,xi,gpg=jpg}∑ni=1I{xi,g1=j1,⋯,xi,gpg=jpg}. | (2.11) |
When Xg is continuous,
ˆwjg=1nn∑i=1I{xi,g1∈(qg1,(j−1),qg1,(j)],⋯,xi,gpg∈(qgpg,(j−1),qgpg(j)]}, | (2.12) |
ˆpjgr=∑ni=1I{yi=r,xig1∈(qg1,(j−1) ,qg1,(j) ],⋯,xigpg∈(qgpg,(j−1) ,qgpg(j) ]}∑ni=1I{xig1∈(qg1,(j−1) ,qg1,(j) ],⋯,xigpg∈(qgpg,(j−1) ,qgpg(j) ]}. | (2.13) |
Here, qg1,j is the j/Jth sample normal percentile of {x1,g1,⋯,xn,g1}. In either case, ˆpr=1n∑ni=1I{yi=r}.
We suggest selecting a sub-model ˆD={g:ˆeg≥cn−τ,1≤g≤G}, where both c and τ are predetermined thresholds established via condition (C2) in Section 3. In practice, we can choose a model ˆD = { g:ˆeg is among the top of d largest of all }, where d=[n/logn].
In this section, we establish the screening properties of the GP-SIS. Based on the sure independent screening theories proposed by Ni and Fang [12] and He and Deng [7], the following conditions are assumed:
Condition 1 (C1). There exist two positive constants c1 and c2 such that c1/R≤pr≤c2/R, c1+c2≤R, c1/R≤pjgr≤c2/R and c1/Ng≤wjg≤c2/Ng for every 1≤g≤G, 1≤r≤R and jg∈{(1,1,⋯,1),(2,1,⋯,1),⋯,(J,J,⋯,J)}.
Condition 2 (C2). There exists a positive constant c>0, and 0≤τ<1/2 such that mingϵDeg≥2cn−τ.
Condition 3 (C3). R=O(nε), J=max1≤g≤GNg=O(nκ), where ε≥0, κ≥0 and 2τ+2ε+2κ<1.
Condition 4 (C4). There exists a positive constant c3, such that 0<fg(x|Y=r)<c3 for any 1≤r≤R, and x is in the domain of Xg, where fg(x|Y=r) is the Lebesgue density function of Xg conditional on Y=r.
Condition 5 (C5). There exists a positive constant c4, and 0≤ρ<1/2 such that fg(x)≥c4n−ρ for any 1≤g≤G and x in the domain of Xg, where fg(x) is the Lebesgue density function of Xg. Furthermore, fg(x) was continuous in the domain of Xg.
Condition 6 (C6). R=O(nε), J=max1≤g≤GNg=O(nk), where 2τ+2ε+2κ+2ρ<1 and ε≥0,κ≥0.
Condition 7 (C7). liminfp→∞{mingϵDeg−maxgϵIeg}≥δ, where δ>0 is a constant.
Condition (C1) guarantees that the proportion of each class of variables cannot be either extremely small or extremely large. A similar assumption was made for conditions (C1) in Huang et al. [8] and Cui et al. [4]. According to Fan and Lv [5] and Cui et al. [4], Condition (C2) allows the minimum true signal to disappear to zero in the order of n−τ as the sample size goes to infinity. According to Ni and Fang [12] and He and Deng [7], Condition (C3) provides for the covariates to diverge with a certain order and number of classes for the response, and Condition (C6) slightly modifies Condition (C3). To ensure that the sample percentiles are close to the true percentiles, Condition (C4) rules out the extreme case in which some Xg places a heavy mass in a small range. Condition (C5) requires n−ρ as the lower bound of the density. Cui et al. [4] and Zhu et al. [24] proposed the ranking consistency property; assuming the inactive covariate subset I={1,⋯,P}∖D, then Condition (C7) is established; a similar assumption was also made by Ni and Fang [12] and He and Deng [7].
Theorem 3.1 (Sure screening property). Under conditions (C1) to (C3), if all the covariates are categorical, we obtain:
P(D⊆ˆD)≥1−O(pexp−bn1−(2τ+2ε+2κ)+(ε+k)logn), |
where b denotes a positive constant. If logp=O(nα) and α<1−(2τ+2ε+2κ), GP-SIS exhibits a sure screening property.
Theorem 3.2 (Sure screening property). Under conditions (C4)–(C6), when the covariates comprise continuous and categorical variables, we obtain
P(D⊆ˆD)≥1−O(pexp−bn1−(2τ+2ε+2κ+2ρ)+(ε+κ)logn), |
where b denotes a positive constant. If logp=O(nα) and α<1−(2τ+2ε+2κ+2ρ), GP-SIS exhibits a sure screening property.
Theorem 3.3 (Ranking consistency property). Under conditions (C1), (C4), (C5) and (C7), if logRNglogn=O(1) and max{logp,logn}R4Ng4n1−2ρ=O(1), then liminfn→∞{mingϵGˆeg−maxgϵIˆeg}>0,a.s.
Theorem 3.3 testifies that the proposed screening index can effectively separate active and inactive covariates at the sample level.
In this subsection, we conduct four simulation studies to demonstrate the finite sample performance of the group screen methods described in Section 2. We compared GP-SIS with IG-SIS [12] and GIG-SIS [7] in terms of performance using the following evaluation criteria: we ranked the features inside each replication in accordance with each screening criterion and noted the minimum model size (MMS) required to accommodate all of the active features. The 5,25,50,75, and 95% quantiles of the MMS over 100 replications were used to determine the screening performance. Following Shao and Zhang [20], we denote this as CPa. CPa close to 1 is evidence of sure screening for a procedure. We also consider predictor-specific inclusion proportions, which are denoted as CP1, CP2, and CP3, respectively. These represent the coverage probability, which is a given model size of [n/logn], 2[n/logn], and 3[n/logn], including the indicators of all active covariates. This allows us to further investigate the active predictors that are easier to predict.
Model 1: categorical covariates and binary response
First, we consider the response variables of different categories. According to Ni and Fang [12] and He and Deng [7], we assume a model in which the response yi is binary, where R=2, and all covariates are categorical. We consider two distributions for yi:
(1) Balanced, pr=P(yi=r)=1/2;
(2) Unbalanced, pr=2[1+R−rR−1]/3R with max1≤r≤Rpr=2min1≤r≤Rpr.
The true model was defined as D={1,⋯,9} with d0=9, and the group size was d0G=3. Under the condition of yi, the latent variable is generated as zi=(zi,1,⋯,zi,P), where zi,kN(urk,1), 1≤k≤P. Subsequently, we construct the active covariates:
(1) If k>d0, then urk=0;
(2) If k≤d0 and r=1, then urk=−0.5;
(3) If k≤d0 and r=2, then urk=0.5.
Next, we apply the quantile of the standard normal distribution to generate the covariates. The specific approach is as follows.
(1) When k is an odd number, that is, xi,k=I(zi,k>z(j2))+1;
(2) When k is an even number, that is, xi,k=I(zi,k>z(j5))+1;
where αth percentile of the standard normal distribution is z(α).
Thus, among all P covariates, the covariates of the two categories and five categories accounted for half. In this model, we considered P=1500 and n=80,100,120.
Table 2 shows the evaluation criteria over 100 simulations for Model 1. The results argue that the proposed GP-SIS works well. When the sample size n increases, GP-SIS is close to d0G=3 in MMS, and both increase to 1 in coverage probability. MMS in an unbalanced response is better than in a balanced response in performance via comparing the responses of different structures. Moreover, GP-SIS is more robust than the other two methods in performance because the fluctuation range in MMS is small.
Condition | MMS | CP | ||||||||
5% | 25% | 50% | 75% | 95% | CP1 | CP2 | CP3 | CPa | ||
Balanced Y, n=80, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 89.8 | 120.0 | 162.5 | 193.0 | 231.1 | 0.72 | 0.79 | 0.80 | 0.00 | |
Balanced Y, n=100, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 17.0 | 22.0 | 26.0 | 31.0 | 40.1 | 0.89 | 0.97 | 1.00 | 1.00 | |
Balanced Y, n=120, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 10.0 | 11.0 | 13.0 | 15.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
UnBalanced Y, n=80, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 4.0 | 5.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 130.9 | 197.0 | 226.0 | 282.5 | 373.4 | 0.66 | 0.79 | 0.84 | 0.00 | |
UnBalanced Y, n=100, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 4.0 | 0.89 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 11.0 | 15.0 | 17.0 | 20.0 | 27.0 | 0.90 | 0.99 | 1.00 | 1.00 | |
UnBalanced Y, n=120, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 10.0 | 10.0 | 12.0 | 1.00 | 1.00 | 1.00 | 1.00 |
Model 2: categorical covariates and multi-class response
We consider more covariate classifications, and the response yi is multi-class, where R=10. We consider yi of the two distributions:
(1) Balanced, pr=P(yi=r)=1/R;
(2) Unbalanced, pr=2[1+R−rR−1]/3R with max1≤r≤Rpr=2min1≤r≤Rpr.
The true model was defined as D={1,⋯,9} with d0=9, and the group size was d0G=3. Condition on yi, the latent variable is generated as zi=(zi,1,⋯,zi,P), for covariates Xk,xi,k=fk(εi,k+μi,k), where εi,k∼t(4) and fk(⋅) represents a quantile function of standard normal distribution. We then construct the active covariates by defining ui,k:
(1) If k>d0, then urk=0;
(2) If ≤d0, then urk=1.5×(−0.9)r.
Next, we apply the fk(⋅) to generate covariates and consider P=2000,n=100,150,200 in this model. The specific approach is as follows:
(1) For k≤400, fk(εi,k+μi.k)=I(zi,k>z(j2))+1;
(2) For 401≤k≤800, fk(εi,k+μi.k)=I(zi,k>z(j4))+1;
(3) For 801≤k≤1200, fk(εi,k+μi.k)=I(zi,k>z(j6))+1;
(4) For 1201≤k≤1600, fk(εi,k+μi.k)=I(zi,k>z(j8))+1;
(5) For 1601≤k, fk(εi,k+μi.k)=I(zi,k>z(j10))+1.
Thus, among all the P covariates, the covariates of two, four, six, eight, and ten categories accounted for one-fifth each.
Table 3 shows the evaluation criteria over 100 simulations for Model 2. Two methods in performance under Model 1 are worse than Model 2. When the model is more intricate, GP-SIS in performance is better than IG-SIS. Particularly, GP-SIS and GIG-SIS have a slightly small MMS under a small sample size n. When the sample size n increases, GP-SIS is close to d0G=3 in MMS, and both increase to 1 in coverage probability. MMS in an unbalanced response is better than in a balanced response in performance via comparing the responses of different structures. Furthermore, GP-SIS is more robust in performance because the fluctuation range in MMS is small.
Condition | MMS | CP | ||||||||
5% | 25% | 50% | 75% | 95% | CP1 | CP2 | CP3 | CPa | ||
Balanced Y, n=100, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 9.0 | 10.0 | 45.1 | 0.99 | 0.99 | 0.99 | 0.95 | |
Balanced Y, n=150, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
Balanced Y, n=200, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
UnBalanced Y, n=100, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 10.0 | 10.0 | 12.0 | 0.66 | 0.79 | 0.84 | 0.00 | |
UnBalanced Y, n=150, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
UnBalanced Y, n=200, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 1.00 | 1.00 | 1.00 | 1.00 |
Model 3: continuous and categorical covariates
Finally, among the covariates that are both continuous and categorical, we assume a more complex example, where response yi is multi-class with R=4. We consider yi of the two distributions:
(1) Balanced, pr=P(yi=r)=1/R;
(2) Unbalanced, pr=2[1+R−rR−1]/3R with max1≤r≤Rpr=2min1≤r≤Rpr.
In this model, we consider P=3000,n=180,220,260. The true model is defined at D={1,2,3,751,752,753,1501,1502, 1503,1504,1505,1506} with d0=12, and the group size is 4. Under condition yi, the latent variable is generated as zi=(zi,1,⋯,zi,p). For covariates Xk,zi,kN(μi,1),1≤k≤P, where ui=(ui1,…,uip)T with uik=(−1)rθrk when yi=r and k∈D. According to He and Deng [7] and Ni and Fang [12], θrk is listed in Table 4. uik=0 when k∉D. To generate Xk:
θrk | K | |||||||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
r=1 | 0.2 | 0.8 | 0.7 | 0.2 | 0.2 | 0.9 | 0.1 | 0.1 | 0.7 | 0.7 | 0.3 | 0.5 |
r=2 | 0.9 | 0.3 | 0.3 | 0.7 | 0.8 | 0.4 | 0.7 | 0.6 | 0.4 | 0.4 | 0.8 | 0.2 |
r=3 | 0.1 | 0.9 | 0.9 | 0.1 | 0.3 | 0.1 | 0.4 | 0.3 | 0.6 | 0.6 | 0.4 | 0.7 |
r=4 | 0.7 | 0.2 | 0.2 | 0.6 | 0.7 | 0.6 | 0.8 | 0.9 | 0.1 | 0.1 | 0.8 | 0.6 |
For k≤750, xik=j,ifzik∈(z(j−1)/4,zj/4];
For 750<k≤1500, xik=j,ifzik∈(z(j−1)/10,zj/10];
For 1501≤k, xik=zik.
Thus, among all the P covariates, the covariates of four categories and ten categories accounted for one- fifth, and the other covariates were continuous. Similarly, there are three in four categories and ten in ten categories, and the active covariates are continuous, accounting for half. For continuous covariates, we applied different slices, J=4,8,10. The corresponding approaches were defined as GP-SIS-4, IG-SIS-4, GP-SIS-8, IG-SIS-8, GP-SIS-10 and IG-SIS-10. When the numbers of covariates are grouped, He and Deng [7] proposed a grouped feature screening algorithm by using the joint information entropy to screen some important grouped covariates. We denote these as GIG-SIS-4, GIG-SIS-8 and GIG-SIS-10.
Tables 5 and 6 present the simulation results with over 100 simulations for the balanced and unbalanced cases, respectively. When the sample size n increases, GP-SIS is close to d0G=3 in MMS, and both increase to 1 in coverage probability. The coverage probability of GP-SIS is close to that of GIG-SIS in the five indexes. Therefore, it was proved that GP-SIS has the characteristics of group feature screening. MMS in an unbalanced response is better than in a balanced response in terms of performance by comparing the responses of different structures.
Condition | MMS | CP | ||||||||
5% | 25% | 50% | 75% | 95% | CP1 | CP2 | CP3 | CPa | ||
Balanced Y, n=180, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 15.0 | 22.0 | 32.5 | 52.8 | 92.1 | 0.94 | 0.98 | 0.99 | 0.88 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 13.0 | 16.0 | 20.0 | 31.3 | 79.1 | 0.97 | 0.99 | 0.99 | 0.93 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 14.0 | 17.0 | 22.0 | 39.3 | 123.1 | 0.96 | 0.99 | 0.99 | 0.87 | |
Balanced Y, n=220, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 13.0 | 14.0 | 16.0 | 19.0 | 26.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 13.0 | 15.0 | 18.0 | 21.0 | 29.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 14.0 | 17.0 | 22.0 | 26.0 | 37.4 | 0.99 | 1.00 | 1.00 | 1.00 | |
Balanced Y, n=260, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 12.0 | 12.0 | 14.0 | 15.0 | 18.1 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 12.0 | 13.0 | 14.0 | 17.0 | 21.1 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 12.0 | 14.0 | 16.0 | 19.0 | 28.1 | 1.00 | 1.00 | 1.00 | 1.00 |
Condition | MMS | CP | ||||||||
5% | 25% | 50% | 75% | 95% | CP1 | CP2 | CP3 | CPa | ||
UnBalanced Y, n=180, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 39.0 | 43.0 | 46.0 | 50.3 | 55.1 | 0.84 | 0.97 | 1.00 | 1.00 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 22.0 | 26.0 | 29.0 | 32.0 | 35.1 | 0.93 | 1.00 | 1.00 | 1.00 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 21.0 | 23.0 | 26.0 | 29.0 | 33.0 | 0.95 | 1.00 | 1.00 | 1.00 | |
UnBalanced Y, n=220, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 16.0 | 18.0 | 20.0 | 22.0 | 25.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 13.0 | 14.0 | 16.0 | 17.0 | 18.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 15.0 | 17.0 | 19.0 | 20.0 | 22.1 | 1.00 | 1.00 | 1.00 | 1.00 | |
UnBalanced Y, n=260, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 12.0 | 12.0 | 13.0 | 13.0 | 14.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 12.0 | 12.0 | 13.0 | 13.0 | 14.1 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 12.0 | 13.0 | 13.0 | 14.0 | 15.0 | 1.00 | 1.00 | 1.00 | 1.00 |
Furthermore, GP-SIS and GIG-SIS are robust in performance, because the fluctuation range in the MMS is small for the two types of responses. When different slices are applied in continuous covariates, GP-SIS and GIG-SIS are better in terms of the five indices of coverage probability and MMS in performance by comparing the responses of different structures. Therefore, three methods are independent of the number of slices in performance.
Model 4. Computational time complexity analysis
Similar to Model 1. However, for the distribution of yi, we consider balanced data, that is, P(yi=r)=1/2. The true model was defined as D={1,…,9}, with d0=9 and d0G=3, and the group size was 3. The active and irrelevant covariates were generated in the same way as in Model 1. Similarly, half of the p-dimensional covariates are two-category, while the other half are five-category covariates. Model 4 controls for a constant sample size of 150 and considers a dimensional vector of covariates ranging from 1500 to 10500, with an equal series of 1000 equal differences. The running times of the five methods were recorded for each experiment, and the median running time in 100 replicate experiments was recorded as the running time index of the five methods. The trends of the five methods will be compared as the dimension of covariates increases to compare the computational complexities of the methods.
The median run times in Table 7 show a linear trend for the three methods as the sample size varies linearly. It can be observed that the running time of GP-SIS is not much different from that of GIG-SIS. In ultrahigh-dimensional feature screening, both GP-SIS and GIG-SIS are robust, but the computational time of GP-SIS is shorter than that of GIG-SIS. For grouped variable feature screening, our method was superior to GIG-SIS.
Screening Methods | P | |||||||||
1500 | 2500 | 3500 | 4500 | 5500 | 6500 | 7500 | 8500 | 9500 | 10500 | |
GP-SIS | 2.636 | 4.202 | 6.173 | 6.617 | 8.124 | 9.275 | 10.705 | 12.135 | 17.139 | 15.757 |
IG-SIS | 3.145 | 5.127 | 7.512 | 7.923 | 9.696 | 10.967 | 12.629 | 14.296 | 19.841 | 18.478 |
GIG-SIS | 3.677 | 6.283 | 8.933 | 9.393 | 11.532 | 13.085 | 15.079 | 17.103 | 24.702 | 22.291 |
In this subsection, we analyze a real data-set from the feature selection database at Arizona State University (http://featureselection.asu.edu/). The lung biological data included 203 samples and 3312 features, which were unbalanced owing the response variable. Every class is 139,17,21,20, and 6, and the covariates are not only continuous but also have group correlations. We randomly divided the data into two parts, where 90% of the data represented training data and 10% of the data represented the test data. The sample sizes of training data and test data respectively are n=182 and n=21. The dimensions of both the training data and test data were P=3312.
We utilized a ten-fold cross-validation method to assess the performances of various classification algorithms to eliminate the model accuracy issues caused by various training data. Active covariates were chosen by GP-SIS-10, IG-SIS-10 and GIG-SIS-10 based on the training data. We classified them using a variety of techniques, including Support Vector Machine [19], Random Forest (RF), and Decision Tree (DT) [10], using the active covariates chosen based on GP-SIS, IG-SIS and GIG-SIS. G-mean and F-measure are the evaluation indices employed. The performance of feature screening for unbalanced high-dimensional data improves with higher G-means and F-measures [7]. Table 8 shows the G-mean and F-measure for the training data and test data using the three classification techniques GP-SIS-10, IG-SIS-10 and GIG-SIS-10. Among all classification methods, GP-SIS exhibited the best performance, where F-measure of GP-SIS is closer to 1 than those of the other two methods. However, the F-measure (test) of some response's class is a little small, which is close to 0 in all classification methods. In other words, the proposed GP-SIS method performed better.
screening method | response | |||||
1 | 2 | 3 | 4 | 5 | ||
classification method | SVM | |||||
G-mean(train data) | GP-SIS | 0.9986 | 0.7995 | 0.8108 | 0.8131 | 0.7801 |
GIG-SIS | 0.9979 | 0.7903 | 0.8095 | 0.8111 | 0.7768 | |
IG-SIS | 0.9992 | 0.7914 | 0.7865 | 0.8084 | 0.7738 | |
G-mean(test data) | GP-SIS | 0.9941 | 0.9790 | 0.9825 | 0.9973 | 0.9888 |
GIG-SIS | 0.9954 | 0.9773 | 0.9937 | 1.0000 | 0.9947 | |
IG-SIS | 0.9959 | 0.9841 | 0.9758 | 1.0000 | 0.9943 | |
F-measure(train data) | GP-SIS | 0.9762 | 0.8080 | 0.8518 | 0.8576 | 0.6432 |
GIG-SIS | 0.9635 | 0.6843 | 0.7873 | 0.7942 | 0.5259 | |
IG-SIS | 0.9480 | 0.6295 | 0.5900 | 0.7255 | 0.4425 | |
F-measure(test data) | GP-SIS | 0.8946 | 0.3352 | 0.4203 | 0.4828 | 0.0900 |
GIG-SIS | 0.9208 | 0.3433 | 0.5683 | 0.5446 | 0.2000 | |
IG-SIS | 0.9116 | 0.3555 | 0.3422 | 0.5502 | 0.2400 | |
classification method | DT | |||||
G-mean(train data) | GP-SIS | 0.9944 | 0.7861 | 0.7976 | 0.7988 | 0.7511 |
GIG-SIS | 0.9931 | 0.7715 | 0.7829 | 0.7914 | 0.7449 | |
IG-SIS | 0.9952 | 0.7821 | 0.7777 | 0.8013 | 0.7471 | |
G-mean(test data) | GP-SIS | 0.9907 | 0.9896 | 0.9849 | 0.9949 | 0.9829 |
GIG-SIS | 0.9927 | 0.9646 | 0.9831 | 0.9891 | 0.9816 | |
IG-SIS | 0.9917 | 0.9840 | 0.9751 | 1.0000 | 0.9829 | |
F-measure(train data) | GP-SIS | 0.9190 | 0.5290 | 0.5988 | 0.6085 | 0.0000 |
GIG-SIS | 0.89029 | 0.3551 | 0.4698 | 0.5202 | 0.0343 | |
IG-SIS | 0.9057 | 0.4776 | 0.4432 | 0.5929 | 0.0000 | |
F-measure(test data) | GP-SIS | 0.8836 | 0.4005 | 0.4233 | 0.4779 | 0.0000 |
GIG-SIS | 0.8459 | 0.1708 | 0.3441 | 0.3847 | 0.0000 | |
IG-SIS | 0.8745 | 0.3195 | 0.2819 | 0.4683 | 0.0000 | |
classification method | RF | |||||
G-mean(train data) | GP-SIS | 1.0000 | 0.8099 | 0.8188 | 0.8166 | 0.7848 |
GIG-SIS | 1.0000 | 0.8099 | 0.8187 | 0.8166 | 0.7847 | |
IG-SIS | 1.0000 | 0.8045 | 0.8098 | 0.8143 | 0.7818 | |
G-mean(test data) | GP-SIS | 0.9963 | 0.9819 | 0.9884 | 0.9975 | 0.9834 |
GIG-SIS | 0.9978 | 0.9598 | 0.9819 | 0.9913 | 0.9859 | |
IG-SIS | 0.9958 | 0.9816 | 0.9761 | 1.0000 | 0.9975 | |
F-measure(train data) | GP-SIS | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
GIG-SIS | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | |
IG-SIS | 0.9846 | 0.8772 | 0.8925 | 0.9025 | 0.7340 | |
F-measure(test data) | GP-SIS | 0.9036 | 0.3533 | 0.5067 | 0.5138 | 0.0000 |
GIG-SIS | 0.8856 | 0.1300 | 0.3968 | 0.4885 | 0.0286 | |
IG-SIS | 0.9137 | 0.3605 | 0.3722 | 0.5303 | 0.2567 |
In the data, there were continuous and categorical grouping covariates, and the response was categorical, which is common in practice, but the applicable screening methods are limited. We propose a GP-SIS procedure based on the Gini impurity to effectively screen grouping covariates. GP-SIS has a sure screening property and ranking consistency property, theoretically, and is model-free. When the numbers of categories of all grouping covariates are the same and different, GP-SIS is quite similar to GIG-SIS in performance, which can be shown in the simulation. Practically, as shown by the simulation results, compared with the existing group feature screening method and single covariate feature screening, GP-SIS has a better performance.
Group feature screening reports difficulties based on missing data. In the future, based on the classification model, we intend to propose a new group feature screening method for either the missing variable or response variable.
The work was supported by National Natural Science Foundation of China [grant number 71963008]. The authors are grateful to the editor and anonymous referee for their constructive comments that led to significant improvements in the paper.
All authors declare no conflicts of interest in this paper.
The lung biological data that support the findings of this study are available from the feature selection database of Arizona State University (http://featureselection.asu.edu/).
Proof of Proposition 2.1. To prove Proposition 2.1, we need to define f(x)=x2, proved to be close Ni and Fang [12]. By Jensen's inequality,
pg∑i=1J∑ji=1w(j1,⋯,jpg)R∑r=1p2(j1,⋯,jpg)r=R∑r=1[pg∑i=1J∑ji=1w(j1,⋯,jpg)f(p(j1,⋯,jpg)r)]≥R∑r=1f(pg∑i=1J∑ji=1w(j1,⋯,jpg)p(j1,⋯,jpg)r)=R∑r=1f(pg∑i=1J∑ji=1P(Xgi=ji)P(Y=r|Xgi=ji))=R∑r=1p2r, |
and then
GP=(1−R∑r=1p2r)−Jg∑jgwjg(1−R∑r=1p2jgr)=1−R∑r=1p2r−Jg∑jgwjg+Jg∑jgwjgR∑r=1p2jgr=Jg∑jgwjgR∑r=1p2jgr−R∑r=1p2r≥0. |
The above equation holds if and only if p(j1,⋯,jpg)r=p(j1,⋯,jpg)′r, for any 1 ≤r ≤R, 1≤i≤pg and 1≤jpg≤j′pg≤J. That is, Xg and Y are independent.
Proof of Proposition 2.2. From the same proof as Proposition 2.1, we can get that GPJ(Y|Xg)≥0 holds if and only if p(j1,⋯,jpg)r=p(j1,⋯,jpg)′r. So, when Xg and Y are independent, GPJ(Y|Xg)=0.
P(Xg1∈(qg1,(j−1),qg1,(j)],⋯,Xgpg∈(qgpg,(j−1),qgpg,(j)]|Y=r)=P(Y=r|Xg1∈(qg1,(j−1),qg1,(j)],⋯,Xgpg∈(qgpg,(j−1),qgpg,(j)])/JP(Y=r)=P(Y=r|Xg1∈(qg1,(j′−1),qg1,(j′)],⋯,Xgpg∈(qgpg,(j′−1),qgpg,(j′)])/JP(Y=r)=P(Xg1∈(qg1,(j′−1),qg1,(j′)],⋯,Xgpg∈(qgpg,(j′−1),qgpg,(j′)]|Y=r). |
By ∑Jj=1P(Xg1∈(qg1,(j−1),qg1,(j)],⋯,Xgpg∈(qgpg,(j−1),qgpg,(j)])=1, we have P(Xg1∈(qg1,(j−1),qg1,(j)],⋯,Xgpg∈(qgpg,(j−1),qgpg,(j)])=(1/J)pg and P(Xg1≤qg1,j,⋯,Xgpg≤qgpg,j|Y=r)=(j/J)pg if covariates have a similar distribution.
Lemma 1 (Bernstein inequality). If Z1,⋯,Zn is an independent random variable with a mean value of 0, and bounded supporter is [−M,M], then we have the inequality: P(|∑ni=1Zi|>t)≤2exp{−t22(v+Mt3)} where v≥Var(∑ni=1Zi).
Lemma 2. For discrete group covariates Xg and discrete response Y, we have the following three inequalities:
(a)P(|ˆpr−pr|>t)≤2exp{−6nt23+4t};
(b)P(|ˆwjg−wjg|>t)≤2exp{−6nt23+4t};
(c)P(|ˆpjgr−pjgr|>t)≤2exp{−6nt23+4t}.
Proof of Lemma 2. Three inequalities are similar in the proofs, where inequality (a) and inequality (b), respectively, have been given at Ni [14] and He and Deng [7]. The following is the proof of inequality (c).
ˆpjgr=∑ni=1I{yi=r,xi,g1=j1,⋯,xi,gpg=jpg}∑ni=1I{xi,g1=j1,⋯,xi,gpg=jpg}. |
The expectation of ˆpjgr is
E(ˆpjgr)=E(∑ni=1I{yi=r,xi,g1=j1,⋯,xi,gpg=jpg}∑ni=1I{xi,g1=j1,⋯,xi,gpg=jpg})=E(I{yi=r,xi,g1=j1,⋯,xi,gpg=jpg}I{xi,g1=j1,⋯,xi,gpg=jpg})=pjgr. |
Let Zi=I{yi=r|Xig1=j1,⋯,Xigpg=jpg}−pjgr, Var(∑ni=1Zi)=nVar(Zi)=npjgr(1−pjgr)≤n4 be known, and then
P(|ˆpjgr−pjgr|>t)=P(|n−1n∑i=1Zi|>t)=P(|n∑i=1Zi|>nt)≤2exp{−n2t22(n4+nt3)}≤2exp{−6nt23+4t}. |
According to the Bernstein inequality, the formula is held.
Lemma 3. With regard to discrete group covariates Xg and discrete response Y, for any 0<ε<1, under condition (C1), we have P(|ˆeg−eg|>2ε)≤O(RJ3)exp{−c5nε2R2J6}, where c5 represents a positive constant.
Proof of Lemma 3. By eg and ˆeg in Section 2.2, we have
logNg(ˆeg−eg)=[(1−R∑r=1ˆp2r)−Jg∑jgˆwjg(1−R∑r=1ˆp2jgr)]−[(1−R∑r=1p2r)−Jg∑jgwjg(1−R∑r=1p2jgr)]=(R∑r=1p2r−R∑r=1ˆp2r)+(Jg∑jgwjg−Jg∑jgˆwjg)+(Jg∑jgˆwjgR∑r=1ˆp2jgr−Jg∑jgwjgR∑r=1p2jgr)=R∑r=1(p2r−ˆp2r)+Jg∑jg(wjg−ˆwjg)+Jg∑jgR∑r=1(ˆwjgˆp2jgr−wjgp2jgr)=R∑r=1(pr−ˆpr)(pr+ˆpr)+Jg∑jg(wjg−ˆwjg)+Jg∑jgR∑r=1[(ˆwjgˆpjgr+wjgpjgr)(ˆpjgr−pjgr)+ˆpjgrpjgr(ˆwjg−wjg)]=I1+I2+I3. |
Since logJ≥log2≥0.5, we have
P(|ˆeg−eg|>ε)≤P(|I1|>ε3)+P(|I2|>ε3)+P(|I3|>ε3). |
For I1, we have
P(|I1|>ε3)≤R∑r=1P(|(pr−ˆpr)(pr+ˆpr)|>ε3)≤R∑r=1P(|(pr−ˆpr)|>c1ε3RJ3)≤RJ32exp{−6n(c1ε3RJ3)23+4(c1ε3RJ3)}. |
For I2, we have
P(|I2|>ε3)≤Jg∑jgP(|ˆwjg−wjg|>c1ε3J3)≤J32exp{−6n(c1ε3J3)23+4(c1ε3J3)}. |
For I3, we have
I3=Jg∑jgR∑r=1[(ˆwjgˆpjgr+wjgpjgr)(ˆpjgr−pjgr)+ˆpjgrpjgr(ˆwjg−wjg)]=Jg∑jgR∑r=1[(ˆwjgˆpjgr+wjgpjgr)(ˆpjgr−pjgr)]+Jg∑jgR∑r=1ˆpjgrpjgr(ˆwjg−wjg):=I31+I32. |
For I31 and I32, we have
P(|I3|>ε3)≤P(|I31|>ε6)+P(|I32|>ε6) |
P(|I31|>ε6)≤Jg∑jgR∑r=1P(|(ˆwjgˆpjgr+wjgpjgr)(ˆpjgr−pjgr)|>ε6)≤Jg∑jgR∑r=1p(|ˆpjgr−pjgr|>c1ε6RJ3)≤RJ32exp{−6n(c1ε6RJ3)23+4(c1ε6RJ3)} |
P(|I32|>ε6)≤Jg∑jgR∑r=1P(|ˆpjgrpjgr(ˆwjg−wjg)|>ε6)≤Jg∑jgR∑r=1P(|ˆwjg−wjg|>c1ε6RJ3)≤RJ32exp{−6n(c1ε6RJ3)23+4(c1ε6RJ3)}. |
In a word, we have the inequality
P(|ˆeg−eg|>2ε)≤O(RJ3)exp{−c5nε2R2J6}, |
where c5 represents a positive constant.
Proof of Theorem 3.1. By Conditions (C1) to (C3) and Lemma 3, we can get
P(D⊆ˆD)≥P(|ˆeg−eg|≤cn−τ,∀gϵD)≥P(max1≤g≤G|ˆeg−eg|≤cn−τ)≥1−G∑g=1P(max1≤g≤G|ˆeg−eg|>cn−τ)≥1−O(RJ3)pexp−c5c2n1−2τR2J6≥1−O(pexp−bn1−2τ−2ε−2κ+(ε+κ)logn), |
where b is a positive constant.
Lemma 4 (Lemma A.5 [7]). Under (C1), (C4) and (C5), mfor any 0<ε<1, so for continuous Xg, we have P(|ˆeg−eg|>2ε)≤O(RNg)exp{−c6n1−2pε2R4N4g}, there exists a positive constant c6.
Proof of Theorem 3.2. According to Lemma 4, the proof of Theorem 3.2 is the same as Theorem 3.1 and hence is omitted.
Proof of Theorem 3.3. According to Lemma 3 and 4 and under Conditions (C1), (C4), (C5) and (C7), we get
P(ming∈Dˆeg−maxg∈Iˆeg<δ2)≤P((ming∈Dˆeg−maxg∈Iˆeg)−(ming∈Deg−maxg∈Ieg)<−δ2)≤P(|(ming∈Dˆeg−maxg∈Iˆeg)−(ming∈Deg−maxg∈Ieg)|>δ2)≤P(max1≤g≤G|ˆeg−eg|>δ4)≤O(RJg)pexp{−c7n1−2ρR4J4g}=O(exp{logRNg+logp−c7n1−2ρR4N4g}), |
where c7=min{c5,c6}(δ24). Since log(RNg)logn=O(1), there exists a positive constant c8 such that log(RNg)≤c8logn. Also, max{logp,logn}R4Ng4n1−2ρ=O(1) implies that logp≤12c7n1−2ρR4N4g and 12c7n1−2ρR4N4g≥(c8+2)logn for large n. Next, there exists a constant n0, and we can get ∑∞n=n0exp{logRNg+logp−c7n1−2ρR4N4g}≤∑∞n=n0exp{c8logn−12c7n1−2ρR4N4g}≤∑∞n=n0exp{c8logn−(c8+2)logn}=∑∞n=n0n−2<∞. According to Ni and Fang [12] and by the Borel Cantelli Lemma, we can get liminfn→∞{mingϵGˆeg−maxgϵIˆeg}≥δ2>0,a.s.
[1] | M. Fečkan, J. Wang, M. Pospíšil, Fractional-Order Equations and Inclusions, De Gruyter, 2017. |
[2] | J. Wang, M. Fečkan, Y. Zhou, A survey on impulsive fractional differential equations, Fract. Calc. Appl. Anal., 19 (2016), 806-831. |
[3] | A. A. Kilbas, H. M. Srivastava, J. J. Trujillo, Theory and Applications of Fractional Differential Equations, Elsevier Science, Amsterdam, 2006. |
[4] |
Y. Zhou, Attractivity for fractional differential equations in Banach space, Appl. Math. Lett., 75 (2018), 1-6. doi: 10.1016/j.aml.2017.06.008
![]() |
[5] |
Y. Zhou, Attractivity for fractional evolution equations with almost sectorial operators, Fract. Calc. Appl. Anal., 21 (2018), 786-800. doi: 10.1515/fca-2018-0041
![]() |
[6] |
Y. Zhou, L. Shangerganesh, J. Manimaran, et al. A class of time-fractional reaction-diffusion equation with nonlocal boundary condition, Math. Meth. Appl. Sci., 41 (2018), 2987-2999. doi: 10.1002/mma.4796
![]() |
[7] |
E. Kaslik, S. Sivasundaram, Non-existence of periodic solutions in fractional-order dynamical systems and a remarkable difference between integer and fractional-order derivatives of periodic functions, Nonlinear Analysis: Real World Applications, 13 (2012), 1489-1497. doi: 10.1016/j.nonrwa.2011.11.013
![]() |
[8] |
J. Wang, M. Fečkan, Y. Zhou, Nonexistence of periodic solutions and asymptotically periodic solutions for fractional differential equations, Commun. Nonlinear Sci., 18 (2013), 246-256. doi: 10.1016/j.cnsns.2012.07.004
![]() |
[9] |
L. Ren, J. Wang, M. Feckan, Asymptotically periodic solutions for Caputo type fractional evolution equations, Fract. Calc. Appl. Anal., 21 (2018), 1294-1312. doi: 10.1515/fca-2018-0068
![]() |
[10] |
C. Cuevas, J. Souza, S-asymptotically ω-periodic solutions of semilinear fractional integrodifferential equations, Appl. Math. Lett., 22 (2009), 865-870. doi: 10.1016/j.aml.2008.07.013
![]() |
[11] | J. Diblík, M. Fečkan, M. Pospíšil, Nonexistence of periodic solutions and S-asymptotically periodic solutions in fractional difference equations, Appl. Math. Comput., 257 (2015), 230-240. |
[12] | M. Fečkan, J. Wang, Periodic impulsive fractional differential equations, Adv. Nonlinear Anal., 8 (2019), 482-496. |
[13] |
Y. Zhou, F. Jiao, Existence of mild solutions for fractional neutral evolution equations, Comput. Math. Appl., 59 (2010), 1063-1077. doi: 10.1016/j.camwa.2009.06.026
![]() |
[14] | M. Fečkan, J. Wang, Y. Zhou, Periodic solutions for nonlinear evolution equations with noninstantaneous impulses, Nonautonomous Dynamical Systems, 1 (2014), 93-101. |
[15] |
H. Ye, J. Gao, Y. Ding, A generalized Gronwall inequality and its application to a fractional differential equation, J. Math. Anal. Appl., 328 (2007), 1075-1081. doi: 10.1016/j.jmaa.2006.05.061
![]() |
[16] | B. M. Garay, P. Kloeden, Discretization near compact invariant sets, Random and Computational Dynamics, 5 (1997), 93-124. |
[17] |
I. Murdock, C. Robinson, Qualitative dynamics from asymptotic expansions: Local theory, J. Differential Equations, 36 (1980), 425-441. doi: 10.1016/0022-0396(80)90059-5
![]() |
1. | Xinguang Zhang, Lixin Yu, Jiqiang Jiang, Yonghong Wu, Yujun Cui, Gisele Mophou, Solutions for a Singular Hadamard-Type Fractional Differential Equation by the Spectral Construct Analysis, 2020, 2020, 2314-8888, 1, 10.1155/2020/8392397 | |
2. | Xinguang Zhang, Jiqiang Jiang, Lishan Liu, Yonghong Wu, Extremal Solutions for a Class of Tempered Fractional Turbulent Flow Equations in a Porous Medium, 2020, 2020, 1024-123X, 1, 10.1155/2020/2492193 | |
3. | Jingjing Tan, Xinguang Zhang, Lishan Liu, Yonghong Wu, Mostafa M. A. Khater, An Iterative Algorithm for Solving n -Order Fractional Differential Equation with Mixed Integral and Multipoint Boundary Conditions, 2021, 2021, 1099-0526, 1, 10.1155/2021/8898859 | |
4. | Ahmed Alsaedi, Fawziah M. Alotaibi, Bashir Ahmad, Analysis of nonlinear coupled Caputo fractional differential equations with boundary conditions in terms of sum and difference of the governing functions, 2022, 7, 2473-6988, 8314, 10.3934/math.2022463 | |
5. | Lianjing Ni, Liping Wang, Farooq Haq, Islam Nassar, Sarp Erkir, The Effect of Children’s Innovative Education Courses Based on Fractional Differential Equations, 2022, 0, 2444-8656, 10.2478/amns.2022.2.0039 |
Y | X11 | ⋯ | X1p1 | ⋯ | Xg1 | ⋯ | Xgpg | ⋯ | XG1 | ⋯ | XGpG |
Y1 | x1,11 | ⋯ | x1,1p1 | ⋯ | x1,g1 | ⋯ | x1,gpg | ⋯ | x1,G1 | ⋯ | x1,GpG |
⋮ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ |
Yn | xn,11 | ⋯ | xn,1p1 | ⋯ | xn,g1 | ⋯ | xn,gpg | ⋯ | xn,G1 | ⋯ | xn,GpG |
Condition | MMS | CP | ||||||||
5% | 25% | 50% | 75% | 95% | CP1 | CP2 | CP3 | CPa | ||
Balanced Y, n=80, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 89.8 | 120.0 | 162.5 | 193.0 | 231.1 | 0.72 | 0.79 | 0.80 | 0.00 | |
Balanced Y, n=100, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 17.0 | 22.0 | 26.0 | 31.0 | 40.1 | 0.89 | 0.97 | 1.00 | 1.00 | |
Balanced Y, n=120, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 10.0 | 11.0 | 13.0 | 15.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
UnBalanced Y, n=80, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 4.0 | 5.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 130.9 | 197.0 | 226.0 | 282.5 | 373.4 | 0.66 | 0.79 | 0.84 | 0.00 | |
UnBalanced Y, n=100, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 4.0 | 0.89 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 11.0 | 15.0 | 17.0 | 20.0 | 27.0 | 0.90 | 0.99 | 1.00 | 1.00 | |
UnBalanced Y, n=120, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 10.0 | 10.0 | 12.0 | 1.00 | 1.00 | 1.00 | 1.00 |
Condition | MMS | CP | ||||||||
5% | 25% | 50% | 75% | 95% | CP1 | CP2 | CP3 | CPa | ||
Balanced Y, n=100, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 9.0 | 10.0 | 45.1 | 0.99 | 0.99 | 0.99 | 0.95 | |
Balanced Y, n=150, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
Balanced Y, n=200, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
UnBalanced Y, n=100, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 10.0 | 10.0 | 12.0 | 0.66 | 0.79 | 0.84 | 0.00 | |
UnBalanced Y, n=150, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
UnBalanced Y, n=200, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 1.00 | 1.00 | 1.00 | 1.00 |
θrk | K | |||||||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
r=1 | 0.2 | 0.8 | 0.7 | 0.2 | 0.2 | 0.9 | 0.1 | 0.1 | 0.7 | 0.7 | 0.3 | 0.5 |
r=2 | 0.9 | 0.3 | 0.3 | 0.7 | 0.8 | 0.4 | 0.7 | 0.6 | 0.4 | 0.4 | 0.8 | 0.2 |
r=3 | 0.1 | 0.9 | 0.9 | 0.1 | 0.3 | 0.1 | 0.4 | 0.3 | 0.6 | 0.6 | 0.4 | 0.7 |
r=4 | 0.7 | 0.2 | 0.2 | 0.6 | 0.7 | 0.6 | 0.8 | 0.9 | 0.1 | 0.1 | 0.8 | 0.6 |
Condition | MMS | CP | ||||||||
5% | 25% | 50% | 75% | 95% | CP1 | CP2 | CP3 | CPa | ||
Balanced Y, n=180, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 15.0 | 22.0 | 32.5 | 52.8 | 92.1 | 0.94 | 0.98 | 0.99 | 0.88 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 13.0 | 16.0 | 20.0 | 31.3 | 79.1 | 0.97 | 0.99 | 0.99 | 0.93 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 14.0 | 17.0 | 22.0 | 39.3 | 123.1 | 0.96 | 0.99 | 0.99 | 0.87 | |
Balanced Y, n=220, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 13.0 | 14.0 | 16.0 | 19.0 | 26.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 13.0 | 15.0 | 18.0 | 21.0 | 29.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 14.0 | 17.0 | 22.0 | 26.0 | 37.4 | 0.99 | 1.00 | 1.00 | 1.00 | |
Balanced Y, n=260, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 12.0 | 12.0 | 14.0 | 15.0 | 18.1 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 12.0 | 13.0 | 14.0 | 17.0 | 21.1 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 12.0 | 14.0 | 16.0 | 19.0 | 28.1 | 1.00 | 1.00 | 1.00 | 1.00 |
Condition | MMS | CP | ||||||||
5% | 25% | 50% | 75% | 95% | CP1 | CP2 | CP3 | CPa | ||
UnBalanced Y, n=180, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 39.0 | 43.0 | 46.0 | 50.3 | 55.1 | 0.84 | 0.97 | 1.00 | 1.00 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 22.0 | 26.0 | 29.0 | 32.0 | 35.1 | 0.93 | 1.00 | 1.00 | 1.00 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 21.0 | 23.0 | 26.0 | 29.0 | 33.0 | 0.95 | 1.00 | 1.00 | 1.00 | |
UnBalanced Y, n=220, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 16.0 | 18.0 | 20.0 | 22.0 | 25.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 13.0 | 14.0 | 16.0 | 17.0 | 18.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 15.0 | 17.0 | 19.0 | 20.0 | 22.1 | 1.00 | 1.00 | 1.00 | 1.00 | |
UnBalanced Y, n=260, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 12.0 | 12.0 | 13.0 | 13.0 | 14.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 12.0 | 12.0 | 13.0 | 13.0 | 14.1 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 12.0 | 13.0 | 13.0 | 14.0 | 15.0 | 1.00 | 1.00 | 1.00 | 1.00 |
Screening Methods | P | |||||||||
1500 | 2500 | 3500 | 4500 | 5500 | 6500 | 7500 | 8500 | 9500 | 10500 | |
GP-SIS | 2.636 | 4.202 | 6.173 | 6.617 | 8.124 | 9.275 | 10.705 | 12.135 | 17.139 | 15.757 |
IG-SIS | 3.145 | 5.127 | 7.512 | 7.923 | 9.696 | 10.967 | 12.629 | 14.296 | 19.841 | 18.478 |
GIG-SIS | 3.677 | 6.283 | 8.933 | 9.393 | 11.532 | 13.085 | 15.079 | 17.103 | 24.702 | 22.291 |
screening method | response | |||||
1 | 2 | 3 | 4 | 5 | ||
classification method | SVM | |||||
G-mean(train data) | GP-SIS | 0.9986 | 0.7995 | 0.8108 | 0.8131 | 0.7801 |
GIG-SIS | 0.9979 | 0.7903 | 0.8095 | 0.8111 | 0.7768 | |
IG-SIS | 0.9992 | 0.7914 | 0.7865 | 0.8084 | 0.7738 | |
G-mean(test data) | GP-SIS | 0.9941 | 0.9790 | 0.9825 | 0.9973 | 0.9888 |
GIG-SIS | 0.9954 | 0.9773 | 0.9937 | 1.0000 | 0.9947 | |
IG-SIS | 0.9959 | 0.9841 | 0.9758 | 1.0000 | 0.9943 | |
F-measure(train data) | GP-SIS | 0.9762 | 0.8080 | 0.8518 | 0.8576 | 0.6432 |
GIG-SIS | 0.9635 | 0.6843 | 0.7873 | 0.7942 | 0.5259 | |
IG-SIS | 0.9480 | 0.6295 | 0.5900 | 0.7255 | 0.4425 | |
F-measure(test data) | GP-SIS | 0.8946 | 0.3352 | 0.4203 | 0.4828 | 0.0900 |
GIG-SIS | 0.9208 | 0.3433 | 0.5683 | 0.5446 | 0.2000 | |
IG-SIS | 0.9116 | 0.3555 | 0.3422 | 0.5502 | 0.2400 | |
classification method | DT | |||||
G-mean(train data) | GP-SIS | 0.9944 | 0.7861 | 0.7976 | 0.7988 | 0.7511 |
GIG-SIS | 0.9931 | 0.7715 | 0.7829 | 0.7914 | 0.7449 | |
IG-SIS | 0.9952 | 0.7821 | 0.7777 | 0.8013 | 0.7471 | |
G-mean(test data) | GP-SIS | 0.9907 | 0.9896 | 0.9849 | 0.9949 | 0.9829 |
GIG-SIS | 0.9927 | 0.9646 | 0.9831 | 0.9891 | 0.9816 | |
IG-SIS | 0.9917 | 0.9840 | 0.9751 | 1.0000 | 0.9829 | |
F-measure(train data) | GP-SIS | 0.9190 | 0.5290 | 0.5988 | 0.6085 | 0.0000 |
GIG-SIS | 0.89029 | 0.3551 | 0.4698 | 0.5202 | 0.0343 | |
IG-SIS | 0.9057 | 0.4776 | 0.4432 | 0.5929 | 0.0000 | |
F-measure(test data) | GP-SIS | 0.8836 | 0.4005 | 0.4233 | 0.4779 | 0.0000 |
GIG-SIS | 0.8459 | 0.1708 | 0.3441 | 0.3847 | 0.0000 | |
IG-SIS | 0.8745 | 0.3195 | 0.2819 | 0.4683 | 0.0000 | |
classification method | RF | |||||
G-mean(train data) | GP-SIS | 1.0000 | 0.8099 | 0.8188 | 0.8166 | 0.7848 |
GIG-SIS | 1.0000 | 0.8099 | 0.8187 | 0.8166 | 0.7847 | |
IG-SIS | 1.0000 | 0.8045 | 0.8098 | 0.8143 | 0.7818 | |
G-mean(test data) | GP-SIS | 0.9963 | 0.9819 | 0.9884 | 0.9975 | 0.9834 |
GIG-SIS | 0.9978 | 0.9598 | 0.9819 | 0.9913 | 0.9859 | |
IG-SIS | 0.9958 | 0.9816 | 0.9761 | 1.0000 | 0.9975 | |
F-measure(train data) | GP-SIS | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
GIG-SIS | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | |
IG-SIS | 0.9846 | 0.8772 | 0.8925 | 0.9025 | 0.7340 | |
F-measure(test data) | GP-SIS | 0.9036 | 0.3533 | 0.5067 | 0.5138 | 0.0000 |
GIG-SIS | 0.8856 | 0.1300 | 0.3968 | 0.4885 | 0.0286 | |
IG-SIS | 0.9137 | 0.3605 | 0.3722 | 0.5303 | 0.2567 |
Y | X11 | ⋯ | X1p1 | ⋯ | Xg1 | ⋯ | Xgpg | ⋯ | XG1 | ⋯ | XGpG |
Y1 | x1,11 | ⋯ | x1,1p1 | ⋯ | x1,g1 | ⋯ | x1,gpg | ⋯ | x1,G1 | ⋯ | x1,GpG |
⋮ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ | ⋱ | ⋮ |
Yn | xn,11 | ⋯ | xn,1p1 | ⋯ | xn,g1 | ⋯ | xn,gpg | ⋯ | xn,G1 | ⋯ | xn,GpG |
Condition | MMS | CP | ||||||||
5% | 25% | 50% | 75% | 95% | CP1 | CP2 | CP3 | CPa | ||
Balanced Y, n=80, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 89.8 | 120.0 | 162.5 | 193.0 | 231.1 | 0.72 | 0.79 | 0.80 | 0.00 | |
Balanced Y, n=100, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 17.0 | 22.0 | 26.0 | 31.0 | 40.1 | 0.89 | 0.97 | 1.00 | 1.00 | |
Balanced Y, n=120, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 10.0 | 11.0 | 13.0 | 15.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
UnBalanced Y, n=80, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 4.0 | 5.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 130.9 | 197.0 | 226.0 | 282.5 | 373.4 | 0.66 | 0.79 | 0.84 | 0.00 | |
UnBalanced Y, n=100, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 4.0 | 0.89 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 11.0 | 15.0 | 17.0 | 20.0 | 27.0 | 0.90 | 0.99 | 1.00 | 1.00 | |
UnBalanced Y, n=120, p=1500 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 10.0 | 10.0 | 12.0 | 1.00 | 1.00 | 1.00 | 1.00 |
Condition | MMS | CP | ||||||||
5% | 25% | 50% | 75% | 95% | CP1 | CP2 | CP3 | CPa | ||
Balanced Y, n=100, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 9.0 | 10.0 | 45.1 | 0.99 | 0.99 | 0.99 | 0.95 | |
Balanced Y, n=150, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
Balanced Y, n=200, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
UnBalanced Y, n=100, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 10.0 | 10.0 | 12.0 | 0.66 | 0.79 | 0.84 | 0.00 | |
UnBalanced Y, n=150, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
UnBalanced Y, n=200, p=2000 | ||||||||||
GP-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS | 3.0 | 3.0 | 3.0 | 3.0 | 3.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS | 9.0 | 9.0 | 9.0 | 9.0 | 9.0 | 1.00 | 1.00 | 1.00 | 1.00 |
θrk | K | |||||||||||
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
r=1 | 0.2 | 0.8 | 0.7 | 0.2 | 0.2 | 0.9 | 0.1 | 0.1 | 0.7 | 0.7 | 0.3 | 0.5 |
r=2 | 0.9 | 0.3 | 0.3 | 0.7 | 0.8 | 0.4 | 0.7 | 0.6 | 0.4 | 0.4 | 0.8 | 0.2 |
r=3 | 0.1 | 0.9 | 0.9 | 0.1 | 0.3 | 0.1 | 0.4 | 0.3 | 0.6 | 0.6 | 0.4 | 0.7 |
r=4 | 0.7 | 0.2 | 0.2 | 0.6 | 0.7 | 0.6 | 0.8 | 0.9 | 0.1 | 0.1 | 0.8 | 0.6 |
Condition | MMS | CP | ||||||||
5% | 25% | 50% | 75% | 95% | CP1 | CP2 | CP3 | CPa | ||
Balanced Y, n=180, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 15.0 | 22.0 | 32.5 | 52.8 | 92.1 | 0.94 | 0.98 | 0.99 | 0.88 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 13.0 | 16.0 | 20.0 | 31.3 | 79.1 | 0.97 | 0.99 | 0.99 | 0.93 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 14.0 | 17.0 | 22.0 | 39.3 | 123.1 | 0.96 | 0.99 | 0.99 | 0.87 | |
Balanced Y, n=220, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 13.0 | 14.0 | 16.0 | 19.0 | 26.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 13.0 | 15.0 | 18.0 | 21.0 | 29.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 14.0 | 17.0 | 22.0 | 26.0 | 37.4 | 0.99 | 1.00 | 1.00 | 1.00 | |
Balanced Y, n=260, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 12.0 | 12.0 | 14.0 | 15.0 | 18.1 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 12.0 | 13.0 | 14.0 | 17.0 | 21.1 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 12.0 | 14.0 | 16.0 | 19.0 | 28.1 | 1.00 | 1.00 | 1.00 | 1.00 |
Condition | MMS | CP | ||||||||
5% | 25% | 50% | 75% | 95% | CP1 | CP2 | CP3 | CPa | ||
UnBalanced Y, n=180, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 39.0 | 43.0 | 46.0 | 50.3 | 55.1 | 0.84 | 0.97 | 1.00 | 1.00 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 22.0 | 26.0 | 29.0 | 32.0 | 35.1 | 0.93 | 1.00 | 1.00 | 1.00 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 21.0 | 23.0 | 26.0 | 29.0 | 33.0 | 0.95 | 1.00 | 1.00 | 1.00 | |
UnBalanced Y, n=220, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 16.0 | 18.0 | 20.0 | 22.0 | 25.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 13.0 | 14.0 | 16.0 | 17.0 | 18.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 15.0 | 17.0 | 19.0 | 20.0 | 22.1 | 1.00 | 1.00 | 1.00 | 1.00 | |
UnBalanced Y, n=260, p=3000 | ||||||||||
GP-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-4 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-4 | 12.0 | 12.0 | 13.0 | 13.0 | 14.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-8 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-8 | 12.0 | 12.0 | 13.0 | 13.0 | 14.1 | 1.00 | 1.00 | 1.00 | 1.00 | |
GP-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
GIG-SIS-10 | 4.0 | 4.0 | 4.0 | 4.0 | 4.0 | 1.00 | 1.00 | 1.00 | 1.00 | |
IG-SIS-10 | 12.0 | 13.0 | 13.0 | 14.0 | 15.0 | 1.00 | 1.00 | 1.00 | 1.00 |
Screening Methods | P | |||||||||
1500 | 2500 | 3500 | 4500 | 5500 | 6500 | 7500 | 8500 | 9500 | 10500 | |
GP-SIS | 2.636 | 4.202 | 6.173 | 6.617 | 8.124 | 9.275 | 10.705 | 12.135 | 17.139 | 15.757 |
IG-SIS | 3.145 | 5.127 | 7.512 | 7.923 | 9.696 | 10.967 | 12.629 | 14.296 | 19.841 | 18.478 |
GIG-SIS | 3.677 | 6.283 | 8.933 | 9.393 | 11.532 | 13.085 | 15.079 | 17.103 | 24.702 | 22.291 |
screening method | response | |||||
1 | 2 | 3 | 4 | 5 | ||
classification method | SVM | |||||
G-mean(train data) | GP-SIS | 0.9986 | 0.7995 | 0.8108 | 0.8131 | 0.7801 |
GIG-SIS | 0.9979 | 0.7903 | 0.8095 | 0.8111 | 0.7768 | |
IG-SIS | 0.9992 | 0.7914 | 0.7865 | 0.8084 | 0.7738 | |
G-mean(test data) | GP-SIS | 0.9941 | 0.9790 | 0.9825 | 0.9973 | 0.9888 |
GIG-SIS | 0.9954 | 0.9773 | 0.9937 | 1.0000 | 0.9947 | |
IG-SIS | 0.9959 | 0.9841 | 0.9758 | 1.0000 | 0.9943 | |
F-measure(train data) | GP-SIS | 0.9762 | 0.8080 | 0.8518 | 0.8576 | 0.6432 |
GIG-SIS | 0.9635 | 0.6843 | 0.7873 | 0.7942 | 0.5259 | |
IG-SIS | 0.9480 | 0.6295 | 0.5900 | 0.7255 | 0.4425 | |
F-measure(test data) | GP-SIS | 0.8946 | 0.3352 | 0.4203 | 0.4828 | 0.0900 |
GIG-SIS | 0.9208 | 0.3433 | 0.5683 | 0.5446 | 0.2000 | |
IG-SIS | 0.9116 | 0.3555 | 0.3422 | 0.5502 | 0.2400 | |
classification method | DT | |||||
G-mean(train data) | GP-SIS | 0.9944 | 0.7861 | 0.7976 | 0.7988 | 0.7511 |
GIG-SIS | 0.9931 | 0.7715 | 0.7829 | 0.7914 | 0.7449 | |
IG-SIS | 0.9952 | 0.7821 | 0.7777 | 0.8013 | 0.7471 | |
G-mean(test data) | GP-SIS | 0.9907 | 0.9896 | 0.9849 | 0.9949 | 0.9829 |
GIG-SIS | 0.9927 | 0.9646 | 0.9831 | 0.9891 | 0.9816 | |
IG-SIS | 0.9917 | 0.9840 | 0.9751 | 1.0000 | 0.9829 | |
F-measure(train data) | GP-SIS | 0.9190 | 0.5290 | 0.5988 | 0.6085 | 0.0000 |
GIG-SIS | 0.89029 | 0.3551 | 0.4698 | 0.5202 | 0.0343 | |
IG-SIS | 0.9057 | 0.4776 | 0.4432 | 0.5929 | 0.0000 | |
F-measure(test data) | GP-SIS | 0.8836 | 0.4005 | 0.4233 | 0.4779 | 0.0000 |
GIG-SIS | 0.8459 | 0.1708 | 0.3441 | 0.3847 | 0.0000 | |
IG-SIS | 0.8745 | 0.3195 | 0.2819 | 0.4683 | 0.0000 | |
classification method | RF | |||||
G-mean(train data) | GP-SIS | 1.0000 | 0.8099 | 0.8188 | 0.8166 | 0.7848 |
GIG-SIS | 1.0000 | 0.8099 | 0.8187 | 0.8166 | 0.7847 | |
IG-SIS | 1.0000 | 0.8045 | 0.8098 | 0.8143 | 0.7818 | |
G-mean(test data) | GP-SIS | 0.9963 | 0.9819 | 0.9884 | 0.9975 | 0.9834 |
GIG-SIS | 0.9978 | 0.9598 | 0.9819 | 0.9913 | 0.9859 | |
IG-SIS | 0.9958 | 0.9816 | 0.9761 | 1.0000 | 0.9975 | |
F-measure(train data) | GP-SIS | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 |
GIG-SIS | 1.0000 | 1.0000 | 1.0000 | 1.0000 | 1.0000 | |
IG-SIS | 0.9846 | 0.8772 | 0.8925 | 0.9025 | 0.7340 | |
F-measure(test data) | GP-SIS | 0.9036 | 0.3533 | 0.5067 | 0.5138 | 0.0000 |
GIG-SIS | 0.8856 | 0.1300 | 0.3968 | 0.4885 | 0.0286 | |
IG-SIS | 0.9137 | 0.3605 | 0.3722 | 0.5303 | 0.2567 |