Research article Special Issues

MSFSS: A whale optimization-based multiple sampling feature selection stacking ensemble algorithm for classifying imbalanced data

  • Received: 07 March 2024 Revised: 23 April 2024 Accepted: 24 April 2024 Published: 21 May 2024
  • MSC : 68T20

  • Learning from imbalanced data is a challenging task in the machine learning field, as with this type of data, many traditional supervised learning algorithms tend to focus more on the majority class while damaging the interests of the minority class. Stacking ensemble, which formulates an ensemble by using a meta-learner to combine the predictions of multiple base classifiers, has been used for solving class imbalance learning issues. Specifically, in the context of class imbalance learning, a stacking ensemble learning algorithm is generally considered to combine with a specific sampling algorithm. Such an operation, however, might suffer from suboptimization problems as only using a sampling strategy may make it difficult to acquire diverse enough features. In addition, we also note that using all of these features may damage the meta-learner as there may exist noisy and redundant features. To address these problems, we have proposed a novel stacking ensemble learning algorithm named MSFSS, which divides the learning procedure into two phases. The first stage combined multiple sampling algorithms and multiple supervised learning approaches to construct meta feature space by means of cross combination. The adoption of this strategy satisfied the diversity of the stacking ensemble. The second phase adopted the whale optimization algorithm (WOA) to select the optimal sub-feature combination from the meta feature space, which further improved the quality of the features. Finally, a linear regression classifier was trained as the meta learner to conduct the final prediction. Experimental results on 40 benchmarked imbalanced datasets showed that the proposed MSFSS algorithm significantly outperformed several popular and state-of-the-art class imbalance ensemble learning algorithms. Specifically, the MSFSS acquired the best results in terms of the F-measure metric on 27 datasets and the best results in terms of the G-mean metric on 26 datasets, out of 40 datasets. Although it required consuming more time than several other competitors, the increment of the running time was acceptable. The experimental results indicated the effectiveness and superiority of the proposed MSFSS algorithm.

    Citation: Shuxiang Wang, Changbin Shao, Sen Xu, Xibei Yang, Hualong Yu. MSFSS: A whale optimization-based multiple sampling feature selection stacking ensemble algorithm for classifying imbalanced data[J]. AIMS Mathematics, 2024, 9(7): 17504-17530. doi: 10.3934/math.2024851

    Related Papers:

    [1] Ling Zhu, Zhengjie Sun . Refinements of Huygens- and Wilker- type inequalities. AIMS Mathematics, 2020, 5(4): 2967-2978. doi: 10.3934/math.2020191
    [2] Ling Zhu . New inequalities of Wilker’s type for circular functions. AIMS Mathematics, 2020, 5(5): 4874-4888. doi: 10.3934/math.2020311
    [3] Dojin Kim, Patcharee Wongsason, Jongkyum Kwon . Type 2 degenerate modified poly-Bernoulli polynomials arising from the degenerate poly-exponential functions. AIMS Mathematics, 2022, 7(6): 9716-9730. doi: 10.3934/math.2022541
    [4] Thanin Sitthiwirattham, Muhammad Aamir Ali, Hüseyin Budak, Sotiris K. Ntouyas, Chanon Promsakon . Fractional Ostrowski type inequalities for differentiable harmonically convex functions. AIMS Mathematics, 2022, 7(3): 3939-3958. doi: 10.3934/math.2022217
    [5] Ling Zhu . New inequalities of Wilker's type for hyperbolic functions. AIMS Mathematics, 2020, 5(1): 376-384. doi: 10.3934/math.2020025
    [6] Sabila Ali, Shahid Mubeen, Rana Safdar Ali, Gauhar Rahman, Ahmed Morsy, Kottakkaran Sooppy Nisar, Sunil Dutt Purohit, M. Zakarya . Dynamical significance of generalized fractional integral inequalities via convexity. AIMS Mathematics, 2021, 6(9): 9705-9730. doi: 10.3934/math.2021565
    [7] Chanon Promsakon, Muhammad Aamir Ali, Hüseyin Budak, Mujahid Abbas, Faheem Muhammad, Thanin Sitthiwirattham . On generalizations of quantum Simpson's and quantum Newton's inequalities with some parameters. AIMS Mathematics, 2021, 6(12): 13954-13975. doi: 10.3934/math.2021807
    [8] Sarah Elahi, Muhammad Aslam Noor . Integral inequalities for hyperbolic type preinvex functions. AIMS Mathematics, 2021, 6(9): 10313-10326. doi: 10.3934/math.2021597
    [9] Artion Kashuri, Soubhagya Kumar Sahoo, Pshtiwan Othman Mohammed, Eman Al-Sarairah, Nejmeddine Chorfi . Novel inequalities for subadditive functions via tempered fractional integrals and their numerical investigations. AIMS Mathematics, 2024, 9(5): 13195-13210. doi: 10.3934/math.2024643
    [10] Jamshed Nasir, Shahid Qaisar, Saad Ihsan Butt, Hassen Aydi, Manuel De la Sen . Hermite-Hadamard like inequalities for fractional integral operator via convexity and quasi-convexity with their applications. AIMS Mathematics, 2022, 7(3): 3418-3439. doi: 10.3934/math.2022190
  • Learning from imbalanced data is a challenging task in the machine learning field, as with this type of data, many traditional supervised learning algorithms tend to focus more on the majority class while damaging the interests of the minority class. Stacking ensemble, which formulates an ensemble by using a meta-learner to combine the predictions of multiple base classifiers, has been used for solving class imbalance learning issues. Specifically, in the context of class imbalance learning, a stacking ensemble learning algorithm is generally considered to combine with a specific sampling algorithm. Such an operation, however, might suffer from suboptimization problems as only using a sampling strategy may make it difficult to acquire diverse enough features. In addition, we also note that using all of these features may damage the meta-learner as there may exist noisy and redundant features. To address these problems, we have proposed a novel stacking ensemble learning algorithm named MSFSS, which divides the learning procedure into two phases. The first stage combined multiple sampling algorithms and multiple supervised learning approaches to construct meta feature space by means of cross combination. The adoption of this strategy satisfied the diversity of the stacking ensemble. The second phase adopted the whale optimization algorithm (WOA) to select the optimal sub-feature combination from the meta feature space, which further improved the quality of the features. Finally, a linear regression classifier was trained as the meta learner to conduct the final prediction. Experimental results on 40 benchmarked imbalanced datasets showed that the proposed MSFSS algorithm significantly outperformed several popular and state-of-the-art class imbalance ensemble learning algorithms. Specifically, the MSFSS acquired the best results in terms of the F-measure metric on 27 datasets and the best results in terms of the G-mean metric on 26 datasets, out of 40 datasets. Although it required consuming more time than several other competitors, the increment of the running time was acceptable. The experimental results indicated the effectiveness and superiority of the proposed MSFSS algorithm.



    The idea of statistical convergence was given by Zygmund [1] in the first edition of his monograph published in Warsaw in 1935. The concept of statistical convergence was introduced by Steinhaus [2] and Fast [3] and then reintroduced independently by Schoenberg [4]. Over the years and under different names, statistical convergence has been discussed in the Theory of Fourier Analysis, Ergodic Theory, Number Theory, Measure Theory, Trigonometric Series, Turnpike Theory and Banach Spaces. Later on it was further investigated from the sequence spaces point of view and linked with summability theory by Bilalov and Nazarova [5], Braha et al. [6], Cinar et al. [7], Colak [8], Connor [9], Et et al. ([10,11,12,13,14]), Fridy [15], Isik et al. ([16,17,18]), Kayan et al. [19], Kucukaslan et al. ([20,21]), Mohiuddine et al. [22], Nuray [23], Nuray and Aydın [24], Salat [25], Sengul et al. ([26,27,28,29]), Srivastava et al. ([30,31]) and many others.

    The idea of statistical convergence depends upon the density of subsets of the set N of natural numbers. The density of a subset E of N is defined by

    δ(E)=limn1nnk=1χE(k),

    provided that the limit exists, where χE is the characteristic function of the set E. It is clear that any finite subset of N has zero natural density and that

    δ(Ec)=1δ(E).

    A sequence x=(xk)kN is said to be statistically convergent to L if, for every ε>0, we have

    δ({kN:|xkL|ε})=0.

    In this case, we write \newline

    xkstatLaskorSlimkxk=L.

    In 1932, Agnew [32] introduced the concept of deferred Cesaro mean of real (or complex) valued sequences x=(xk) defined by

    (Dp,qx)n=1qnpnqnk=pn+1xk,n=1,2,3,

    where p=(pn) and q=(qn) are the sequences of non-negative integers satisfying

    pn<qnandlimnqn=. (1)

    Let K be a subset of N and denote the set {k:k(pn,qn],kK} by Kp,q(n).

    Deferred density of K is defined by

    δp,q(K)=limn1(qnpn)|Kp,q(n)|, provided the limit exists

    where, vertical bars indicate the cardinality of the enclosed set Kp,q(n). If qn=n, pn=0, then the deferred density coincides with natural density of K.

    A real valued sequence x=(xk) is said to be deferred statistically convergent to L, if for each ε>0

    limn1(qnpn)|{k(pn,qn]:|xkL|ε}|=0.

    In this case we write Sp,q-limxk=L. If qn=n, pn=0, for all nN, then deferred statistical convergence coincides with usual statistical convergence [20].

    In this section, we give some inclusion relations between statistical convergence of order α, deferred strong Cesàro summability of order α and deferred statistical convergence of order α in general metric spaces.

    Definition 1. Let (X,d) be a metric space, (pn) and (qn) be two sequences as above and 0<α1. A metric valued sequence x=(xk) is said to be Sd,αp,q-convergent (or deferred d-statistically convergent of order α) to x0 if there is x0X such that

    limn1(qnpn)α|{k(pn,qn]:xkBε(x0)}|=0,

    where Bε(x0)={xX:d(x,x0)<ε} is the open ball of radius ε and center x0. In this case we write Sd,αp,q-limxk=x0 or xkx0(Sd,αp,q). The set of all Sd,αp,q-statistically convergent sequences will be denoted by Sd,αp,q. If qn=n and pn=0, then deferred d-statistical convergence of order α coincides d -statistical convergence of order α denoted by Sd,α. In the special cases qn=n,pn=0 and α=1 then deferred d -statistical convergence of order α coincides d-statistical convergence denoted by Sd.

    Definition 2. Let (X,d) be a metric space, (pn) and (qn) be two sequences as above and 0<α1. A metric valued sequence x=(xk) is said to be strongly wd,αp,q-summable (or deferred strongly d-Ces àro summable of order α) to x0 if there is x0X such that

    limn1(qnpn)αqnk=pn+1d(xk,x0)=0.

    In this case we write wd,αp,q-limxk=x0 or xkx0(wd,αp,q). The set of all strongly wd,αp,q-summable sequences will be denoted by wd,αp,q. If qn=n and pn=0, for all nN, then deferred strong d-Cesàro summability of order α coincides strong d-Cesàro summability of order α denoted by wd,α. In the special cases qn=n,pn=0 and α=1 then deferred strong d-Cesàro summability of order α coincides strong d-Ces àro summability denoted by wd.

    Theorem 1. Let (pn) and (qn) be sequences of non-negative integers satisfying the condition (1), (X,d) be a linear metric space and x=(xk),y=(yk) be metric valued sequences, then

    (i) If Sd,αp,q-limxk=x0 and Sd,αp,q-limyk=y0, then Sd,αp,q-lim(xk+yk)=x0+y0,

    (ii)If Sd,αp,q-limxk=x0 and cC, then Sd,αp,q-lim(cxk)=cx0,

    (iii) If Sd,αp,q-limxk=x0,Sd,αp,q-limyk=y0 and x,y(X), then Sd,αp,q-lim(xkyk)=x0y0.

    Proof. Omitted.

    Theorem 2. Let (pn) and (qn) be sequences of non-negative integers satisfying the condition (1) and α and β be two real numbers such that 0<αβ1. If a sequence x=(xk) is deferred strongly d-Cesàro summable of order α to x0, then it is deferred d-statistically convergent of order β to x0, but the converse is not true.

    Proof. First part of the proof is easy, so omitted. For the converse, take X=R and choose qn=n,pn=0 (for all nN),d(x,y)=|xy| and define a sequence x=(xk) by

    xk={3n,k=n20,kn2.

    Then for every ε>0, we have

    1(qnpn)α|{k(pn,qn]:xkBε(0)}|[n]nα0, as n,

    where 12<α1, that is xk0(Sd,αp,q). At the same time, we get

    1(qnpn)αqnk=pn+1d(xk,0)[n][3n]nα1

    for α=16 and

    1(qnpn)αqnk=pn+1d(xk,0)[n][3n]nα

    for 0<α<16, i.e., xk0(wd,αp,q) for 0<α16.

    From Theorem 2 we have the following results.

    Corollary 1. ⅰ) Let (pn) and (qn) be sequences of non-negative integers satisfying the condition (1) and α be a real number such that 0<α1. If a sequence x=(xk) is deferred strongly d-Cesàro summable of order α to x0, then it is deferred d-statistically convergent of order α to x0, but the converse is not true.

    ⅱ) Let (pn) and (qn) be sequences of non-negative integers satisfying the condition (1) and α be a real number such that 0<α1. If a sequence x=(xk) is deferred strongly d-Cesàro summable of order α to x0, then it is deferred d-statistically convergent to x0, but the converse is not true.

    ⅲ) Let (pn) and (qn) be sequences of non-negative integers satisfying the condition (1). If a sequence x=(xk) is deferred strongly d-Cesàro summable to x0, then it is deferred d-statistically convergent to x0, but the converse is not true.

    Remark Even if x=(xk) is a bounded sequence in a metric space, the converse of Theorem 2 (So Corollary 1 i) and ii)) does not hold, in general. To show this we give the following example.

    Example 1. Take X=R and choose qn=n,pn=0 (for all nN),d(x,y)=|xy| and define a sequence x=(xk) by

    xk={1k,kn30,k=n3n=1,2,....

    It is clear that x and it can be shown that xSd,αwd,α for 13<α<12.

    In the special case α=1, we can give the followig result.

    Theorem 3. Let (pn) and (qn) be sequences of non-negative integers satisfying the condition (1) and x=(xk) is a bounded sequence in a metric space. If a sequence x=(xk) is deferred d-statistically convergent to x0, then it is deferred strongly d-Cesàro summable to x0.

    Proof. Let x=(xk) be deferred d-statistically convergent to x0 and ε>0 be given. Then there exists x0X such that

    limn1(qnpn)|{k(pn,qn]:xkBε(x0)}|=0,

    Since x=(xk) is a bounded sequence in a metric space X, there exists x0X and a positive real number M such that d(xk,x0)<M for all kN. So we have

    1(qnpn)qnk=pn+1d(xk,x0)=1(qnpn)qnk=pn+1d(xk,x0)εd(xk,x0)+1(qnpn)qnk=pn+1d(xk,x0)<εd(xk,x0)M(qnpn)|{k(pn,qn]:xkBε(x0)}|+ε

    Takin limit n, we get wdp,q-limxk=x0.

    Theorem 4. Let (pn) and (qn) be sequences of non-negative integers satisfying the condition (1) and α be a real number such that 0<α1. If liminfnqnpn>1, then Sd,αSd,αp,q.

    Proof. Suppose that liminfnqnpn>1; then there exists a ν>0 such that qnpn1+ν for sufficiently large n, which implies that

    (qnpnqn)α(ν1+ν)α1qαnνα(1+ν)α1(qnpn)α.

    If xkx0(Sd,α), then for every ε>0 and for sufficiently large n, we have

    1qαn|{kqn:xkBε(x0)}|1qαn|{k(pn,qn]:xkBε(x0)}|να(1+ν)α1(qnpn)α|{k(pn,qn]:xkBε(x0)}|.

    This proves the proof.

    Theorem 5. Let (pn) and (qn) be sequences of non-negative integers satisfying the condition (1) and α and β be two real numbers such that 0<αβ1. If limn(qnpn)αqβn=s>0, then Sd,αSd,βp,q.

    Proof. Let limn(qnpn)αqβn=s>0. Notice that for each ε>0 the inclusion

    {kqn:xkBε(x0)}{k(pn,qn]:xkBε(x0)}

    is satisfied and so we have the following inequality

    1qαn|{kqn:xkBε(x0)}|1qαn|{k(pn,qn]:xkBε(x0)}|1qβn|{k(pn,qn]:xkBε(x0)}|=(qnpn)αqβn1(qnpn)α|{k(pn,qn]:xkBε(x0)}|(qnpn)αqβn1(qnpn)β|{k(pn,qn]:xkBε(x0)}|.

    Therefore Sd,αSd,βp,q.

    Theorem 6. Let (pn),(qn),(pn) and (qn) be four sequences of non-negative real numbers such that

    pn<pn<qn<qn for all nN, (2)

    and α,β be fixed real numbers such that 0<αβ1, then

    (i) If

    limn(qnpn)α(qnpn)β=a>0 (3)

    then Sd,βp,qSd,αp,q,

    (ii) If

    limnqnpn(qnpn)β=1 (4)

    then Sd,αp,qSd,βp,q.

    Proof. (i) Let (3) be satisfied. For given ε>0 we have

    {k(pn,qn]:xkBε(x0)}{k(pn,qn]:xkBε(x0)},

    and so

    1(qnpn)β|{k(pn,qn]:xkBε(x0)}|(qnpn)α(qnpn)β1(qnpn)α|{k(pn,qn]:xkBε(x0)}|.

    Therefore Sd,βp,qSd,αp,q.

    (ii) Let (4) be satisfied and x=(xk) be a deferred d-statistically convergent sequence of order α to x0. Then for given ε>0, we have

    1(qnpn)β|{k(pn,qn]:xkBε(x0)}|1(qnpn)β|{k(pn,pn]:xkBε(x0)}|+1(qnpn)β|{k(qn,qn]:xkBε(x0)}|+1(qnpn)β|{k(pn,qn]:xkBε(x0)}|pnpn+qnqn(qnpn)β+1(qnpn)β|{k(pn,qn]:xkBε(x0)}|=(qnpn)(qnpn)(qnpn)β+1(qnpn)β|{k(pn,qn]:xkBε(x0)}|(qnpn)(qnpn)β(qnpn)β+1(qnpn)β|{k(pn,qn]:xkBε(x0)}|(qnpn(qnpn)β1)+1(qnpn)α|{k(pn,qn]:xkBε(x0)}|

    Therefore Sd,αp,qSd,βp,q.

    Theorem 7. Let (pn),(qn),(pn) and (qn) be four sequences of non-negative integers defined as in (2) and α,β be fixed real numbers such that 0<αβ1.

    (i) If (3) holds then wd,βp,qwd,αp,q,

    (ii) If (4) holds and x=(xk) be a bounded sequence, then wd,αp,qwd,βp,q.

    Proof.

    i) Omitted.

    ii) Suppose that wd,αp,q-limxk=x0 and (xk)(X). Then there exists some M>0 such that d(xk,x0)<M for all k, then

    1(qnpn)βqnk=pn+1d(xk,x0)=1(qnpn)β[pnk=pn+1+qnk=pn+1+qnk=qn+1]d(xk,x0)pnpn+qnqn(qnpn)βM+1(qnpn)βqnk=pn+1d(xk,x0)(qnpn)(qnpn)β(qnpn)βM+1(qnpn)αqnk=pn+1d(xk,x0)=(qnpn(qnpn)β1)M+1(qnpn)αqnk=pn+1d(xk,x0)

    Theorem 8. Let (pn),(qn),(pn) and (qn) be four sequences of non-negative integers defined as in (2) and α,β be fixed real numbers such that 0<αβ1. Then

    (i) Let (3) holds, if a sequence is strongly wd,βp,q-summable to x0, then it is Sd,αp,q-convergent to x0,

    (ii) Let (4) holds and x=(xk) be a bounded sequence in (X,d), if a sequence is Sd,αp,q-convergent to x0 then it is strongly wd,βp,q-summable to x0.

    Proof. (i) Omitted.

    (ii) Suppose that Sd,αp,q-limxk=x0 and (xk). Then there exists some M>0 such that d(xk,x0)<M for all k, then for every ε>0 we may write

    1(qnpn)βqnk=pn+1d(xk,x0)=1(qnpn)βqnpnk=qnpn+1d(xk,x0)+1(qnpn)βqnk=pn+1d(xk,x0)(qnpn)(qnpn)(qnpn)βM+1(qnpn)βqnk=pn+1d(xk,x0)(qnpn)(qnpn)β(qnpn)βM+1(qnpn)βqnk=pn+1d(xk,x0)(qnpn(qnpn)β1)M+1(qnpn)βqnk=pn+1d(xk,x0)εd(xk,x0)+1(qnpn)βqnk=pn+1d(xk,x0)<εd(xk,x0)(qnpn(qnpn)β1)M+M(qnpn)α|{k(pn,qn]:d(xk,x0)ε}|+qnpn(qnpn)βε.

    This completes the proof.

    The authors declare that they have no conflict of interests.



    [1] P. Branco, L. Torgo, R. P. Ribeiro, A survey of predictive modeling on imbalanced domains, ACM Comput. Surv. (CSUR), 49 (2016), 1–50. https://doi.org/10.1145/2907070 doi: 10.1145/2907070
    [2] K. Oksuz, B. C. Cam, S. Kalkan, E. Akbas, Imbalance problems in object detection: A review, IEEE T. Pattern Anal., 43 (2021), 3388–3415. https://doi.org/10.1109/TPAMI.2020.2981890 doi: 10.1109/TPAMI.2020.2981890
    [3] M. Ghorbani, A. Kazi, M. S. Baghshah, H. R. Rabiee, N. Navab, RA-GCN: Graph convolutional network for disease prediction problems with imbalanced data, Med. Image Anal., 75 (2022), 102272. https://doi.org/10.1016/j.media.2021.102272 doi: 10.1016/j.media.2021.102272
    [4] Y. C. Wang, C. H Cheng, A multiple combined method for rebalancing medical data with class imbalances, Comput. Biol. Med., 134 (2021), 104527. https://doi.org/10.1016/j.compbiomed.2021.104527 doi: 10.1016/j.compbiomed.2021.104527
    [5] A. Abdelkhalek, M. Mashaly, Addressing the class imbalance problem in network intrusion detection systems using data resampling and deep learning, J. Supercomput., 79 (2023), 10611–10644. https://doi.org/10.1007/s11227-023-05073-x doi: 10.1007/s11227-023-05073-x
    [6] Z. Li, K. Kamnitsas, B. Glocker, Analyzing overfitting under class imbalance in neural networks for image segmentation, IEEE T. Med. Imaging, 40 (2021), 1065–1077. https://doi.org/10.1109/TMI.2020.3046692 doi: 10.1109/TMI.2020.3046692
    [7] V. Rupapara, F. Rustam, H. F. Shahzad, A. Mehmood, I. Ashraf, G. S. Choi, Impact of SMOTE on imbalanced text features for toxic comments classification using RVVC model, IEEE Access, 9 (2021), 78621–78634. https://doi.org/10.1109/ACCESS.2021.3083638 doi: 10.1109/ACCESS.2021.3083638
    [8] W. Zheng, Y. Xun, X. Wu, Z. Deng, X. Chen, Y. Sui, A comparative study of class rebalancing methods for security bug report classification, IEEE T. Reliab., 70 (2021), 1658–1670. https://doi.org/10.1109/TR.2021.3118026 doi: 10.1109/TR.2021.3118026
    [9] J. Kuang, G. Xu, T. Tao, Q. Wu, Class-imbalance adversarial transfer learning network for cross-domain fault diagnosis with imbalanced data, IEEE T. Instrum. Meas., 71 (2021), 1–11. https://doi.org/10.1109/TIM.2021.3136175 doi: 10.1109/TIM.2021.3136175
    [10] M. Qian, Y. F. Li, A weakly supervised learning-based oversampling framework for class-imbalanced fault diagnosis, IEEE T. Reliab., 71 (2022), 429–442. https://doi.org/10.1109/TR.2021.3138448 doi: 10.1109/TR.2021.3138448
    [11] Y. Aydın, Ü. Işıkdağ, G. Bekdaş, S. M. Nigdeli, Z. W. Geem, Use of machine learning techniques in soil classification, Sustainability, 15 (2023), 2374. https://doi.org/10.3390/su15032374 doi: 10.3390/su15032374
    [12] M. Asgari, W. Yang, M. Farnaghi, Spatiotemporal data partitioning for distributed random forest algorithm: Air quality prediction using imbalanced big spatiotemporal data on spark distributed framework, Environ. Technol. Inno., 27 (2022), 102776. https://doi.org/10.1016/j.eti.2022.102776 doi: 10.1016/j.eti.2022.102776
    [13] L. Dou, F. Yang, L. Xu, Q. Zou, A comprehensive review of the imbalance classification of protein post-translational modifications, Brief. Bioinform., 22 (2021), bbab089. https://doi.org/10.1093/bib/bbab089 doi: 10.1093/bib/bbab089
    [14] S. Y. Bae, J. Lee, J. Jeong, C. Lim, J. Choi, Effective data-balancing methods for class-imbalanced genotoxicity datasets using machine learning algorithms and molecular fingerprints, Comput. Toxicol., 20 (2021), 100178. https://doi.org/10.1016/j.comtox.2021.100178 doi: 10.1016/j.comtox.2021.100178
    [15] G. H. Fu, Y. J. Wu, M. J. Zong, J. Pan, Hellinger distance-based stable sparse feature selection for high-dimensional class-imbalanced data, BMC Bioinformatics, 21 (2020), 121. https://doi.org/10.1186/s12859-020-3411-3 doi: 10.1186/s12859-020-3411-3
    [16] N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, SMOTE: Synthetic minority over-sampling technique, J. Artif. Intell. Res., 16 (2002), 321–357. https://doi.org/10.1613/jair.953 doi: 10.1613/jair.953
    [17] G. E. A. P. A. Batista, R. C. Prati, M. C. Monard, A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explor. Newslett., 6 (2004), 20–29. https://doi.org/10.1145/1007730.1007735 doi: 10.1145/1007730.1007735
    [18] H. He, Y. Bai, E. A. Garcia, S. Li, ADASYN: Adaptive synthetic sampling approach for imbalanced learning, In: 2008 IEEE international joint conference on neural networks (IEEE world congress on computational intelligence), IEEE Press, 2008. https://doi.org/10.1109/IJCNN.2008.4633969
    [19] M. Kubat, S. Matwin, Addressing the curse of imbalanced training sets: one-sided selection, In: International Conference of Machine Learning, Morgan Kaufmann, 1997.
    [20] M. A. Tahir, J. Kittler, F. Yan, Inverse random under sampling for class imbalance problem and its application to multi-label classification, Pattern Recogn., 45 (2012), 3738–3750. https://doi.org/10.1016/j.patcog.2012.03.014 doi: 10.1016/j.patcog.2012.03.014
    [21] A. Zhang, H. Yu, Z. Huan, X. Yang, S. Zheng, S. Gao, SMOTE-RkNN: A hybrid re-sampling method based on SMOTE and reverse k-nearest neighbors, Inform. Sci., 595 (2022), 70–88. https://doi.org/10.1016/j.ins.2022.02.038 doi: 10.1016/j.ins.2022.02.038
    [22] R. Batuwita, V. Palade, FSVM-CIL: Fuzzy support vector machines for class imbalance learning, IEEE T. Fuzzy Syst., 18 (2010), 558–571. https://doi.org/10.1109/TFUZZ.2010.2042721 doi: 10.1109/TFUZZ.2010.2042721
    [23] H. Yu, C. Sun, X. Yang, S. Zheng, H Zou, Fuzzy support vector machine with relative density information for classifying imbalanced data, IEEE T. Fuzzy Syst., 27 (2019), 2353–2367. https://doi.org/10.1109/TFUZZ.2019.2898371 doi: 10.1109/TFUZZ.2019.2898371
    [24] H. Yu, C. Mu, C. Sun, W. Yang, X. Yang, X. Zuo, Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data, Knowl.-Based Syst., 76 (2015), 67–78. https://doi.org/10.1016/j.knosys.2014.12.007 doi: 10.1016/j.knosys.2014.12.007
    [25] H. Yu, C. Sun, X. Yang, W. Yang, J. Shen, Y. Qi, ODOC-ELM: Optimal decision outputs compensation-based extreme learning machine for classifying imbalanced data, Knowl.-Based Syst., 92 (2016), 55–70. https://doi.org/10.1016/j.knosys.2015.10.012 doi: 10.1016/j.knosys.2015.10.012
    [26] J. Laurikkala, Improving identification of difficult small classes by balancing class distribution, In: Artificial Intelligence in Medicine: 8th Conference on Artificial Intelligence in Medicine in Europe, AIME 2001 Cascais, Portugal, Springer Berlin Heidelberg, 2001. https://doi.org/10.1007/3-540-48229-6_9
    [27] F. S. Hanifah, H. Wijayanto, A. Kurnia, Smotebagging algorithm for imbalanced dataset in logistic regression analysis (case: Credit of bank X), Appl. Math. Sci., 9 (2015), 6857–6865. http://dx.doi.org/10.12988/ams.2015.58562
    [28] C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse, A. Napolitano, RUSBoost: Improving classification performance when training data is skewed, In: 19th international conference on pattern recognition, IEEE, 2008. https://doi.org/10.1002/abio.370040210
    [29] Y. Zhang, G. Liu, W. Luan, C. Yan, C. Jiang, An approach to class imbalance problem based on Stacking and inverse random under sampling methods, In: 2018 IEEE 15th international conference on networking, sensing and control (ICNSC), IEEE, 2018. https://doi.org/10.1002/abio.370040210
    [30] Y. Pristyanto, A. F. Nugraha, I. Pratama, A. Dahlan, L. A. Wirasakti, Dual approach to handling imbalanced class in datasets using oversampling and ensemble learning techniques, In: 2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM), IEEE, 2021. https://doi.org/10.1002/abio.370040210
    [31] Z. Seng, S. A. Kareem, K. D. Varathan, A neighborhood undersampling stacked ensemble (NUS-SE) in imbalanced classification, Exp. Syst. Appl., 168 (2021), 114246. https://doi.org/10.1016/j.eswa.2020.114246 doi: 10.1016/j.eswa.2020.114246
    [32] D. H. Wolpert, Stacked generalization, Neural Networks, 5 (1992), 241–259. https://doi.org/10.1016/S0893-6080(05)80023-1 doi: 10.1016/S0893-6080(05)80023-1
    [33] Y. Shi, R. Eberhart, A modified particle swarm optimizer, In: Proceedings of 1998 IEEE international conference on evolutionary computation proceedings. IEEE world congress on computational intelligence (Cat. No. 98TH8360), IEEE, 1998, 69–73. https://doi.org/10.1109/icec.1998.699146
    [34] K. V. Price, Differential evolution: A fast and simple numerical optimizer, In: Proceedings of North American fuzzy information processing, IEEE, 1996,524–527. https://doi.org/10.1109/nafips.1996.534790
    [35] E. Cuevas, M. Cienfuegos, D. Zaldívar, M. Pérez-Cisneros, A swarm optimization algorithm inspired in the behavior of the social-spider, Exp. Syst. Appl., 40 (2013), 6374–6384. https://doi.org/10.1016/j.eswa.2013.05.041 doi: 10.1016/j.eswa.2013.05.041
    [36] S. Mirjalili, A. Lewis, The whale optimization algorithm, Adv. Eng. Soft., 95 (2016), 51–67. https://doi.org/10.1016/j.advengsoft.2016.01.008 doi: 10.1016/j.advengsoft.2016.01.008
    [37] E. Cuevas, A. Rodríguez, M. Perez, J. Murillo-Olmos, B. Morales-Castañ eda, A. Alejo-Reyes, et al., Optimal evaluation of re-opening policies for COVID-19 through the use of metaheuristic schemes, Appl. Math. Model., 121 (2023), 506–523. https://doi.org/10.1016/j.apm.2023.05.012 doi: 10.1016/j.apm.2023.05.012
    [38] M. H. Nadimi-Shahraki, S. Taghian, S. Mirjalili, L. Abualigah, M. Abd Elaziz, D. Oliva, EWOA-OPF: Effective whale optimization algorithm to solve optimal power flow problem, Electronics, 10 (2021), 2975. https://doi.org/10.1007/978-981-16-9447-9_20 doi: 10.1007/978-981-16-9447-9_20
    [39] R. Kundu, S. Chattopadhyay, E. Cuevas, R. Sarkar, AltWOA: Altruistic Whale Optimization Algorithm for feature selection on microarray datasets, Comput. Biol. Med., 144 (2022), 105349. https://doi.org/10.1016/j.compbiomed.2022.105349 doi: 10.1016/j.compbiomed.2022.105349
    [40] M. S. Santos, P. H. Abreu, N. Japkowicz, A. Fernández, C. Soares, S. Wilk, et al., On the joint-effect of class imbalance and overlap: a critical review, Artif. Intell. Rev., 55 (2022), 6207–6275. https://doi.org/10.1007/s10462-022-10150-3 doi: 10.1007/s10462-022-10150-3
    [41] S. K. Pandey, A. K. Tripathi, An empirical study toward dealing with noise and class imbalance issues in software defect prediction, Soft Comput., 25 (2021), 13465–13492. https://doi.org/10.1007/s00500-021-06096-3 doi: 10.1007/s00500-021-06096-3
    [42] L. Breiman, Bagging predictors, Mach. Learn., 24 (1996), 123–140. https://doi.org/10.1007/BF00058655 doi: 10.1007/BF00058655
    [43] R E. Schapire, The strength of weak learnability, Mach. Learn., 5 (1990), 197–227. https://doi.org/10.1007/BF00116037 doi: 10.1007/BF00116037
    [44] A. Krogh, J. Vedelsby, Neural network ensembles, cross validation, and active learning, Adv. Neural Inform. Proces. Syst., 7 (1995), 231–238. Available from: http://papers.nips.cc/paper/1001-neural-network-ensembles-cross-validation-and-active-learning.
    [45] S. Zhang, X. Li, M. Zong, X. Zhu, R. Wang, Efficient kNN classification with different numbers of nearest neighbors, IEEE T. Neur. Net. Learn., 29 (2018), 1774–1785. https://doi.org/10.1109/TNNLS.2017.2673241 doi: 10.1109/TNNLS.2017.2673241
    [46] J R. Quinlan, Induction of decision trees, Mach. Learn., 1 (1986), 81–106. https://doi.org/10.1023/A:1022643204877 doi: 10.1023/A:1022643204877
    [47] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn., 20 (1995), 273–297. https://doi.org/10.1007/BF00994018 doi: 10.1007/BF00994018
    [48] T. Bayes, An essay towards solving a problem in the doctrine of chances, MD Comput. Comput. Med. Pract., 8 (1991), 376–418. https://doi.org/10.1002/abio.370040210 doi: 10.1002/abio.370040210
    [49] A. Tharwat, T. Gaber, A. Ibrahim, A. E. Hassanien, Linear discriminant analysis: A detailed tutorial, AI Commun., 30 (2017), 169–190. https://doi.org/10.3233/AIC-170729 doi: 10.3233/AIC-170729
    [50] X. Su, X. Yan, C. L. Tsai, Linear regression, WIRES Comput. Stat., 4 (2012), 275–294. https://doi.org/10.1002/wics.1198 doi: 10.1002/wics.1198
    [51] C. Blake, E. Keogh, C. J. Merz, UCI repository of machine learning databases, Department of Information and Computer Science, University of California, Irvine, CA, USA, 1998. Available from: http://www.ics.uci.edu/mlearn/MLRepository.html.
    [52] I. Triguero, S. González, J. M. Moyano, S. García, J. Alcalá-Fdez, J. Luengo, et al., KEEL 3.0: An open source software for multi-stage analysis in data mining international, J. Comput. Intell. Syst., 10 (2017), 1238–1249. https://doi.org/10.2991/ijcis.10.1.82 doi: 10.2991/ijcis.10.1.82
    [53] J. Demsar, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res., 7 (2006), 1–30. Available from: http://jmlr.org/papers/v7/demsar06a.html.
    [54] S. García, A. Fernández, J. Luengo, F. Herrera, Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: Experimental analysis of power, Inform. Sci., 180 (2010), 2044–2064. https://doi.org/10.1016/j.ins.2009.12.010 doi: 10.1016/j.ins.2009.12.010
  • This article has been cited by:

    1. Yiting Wu, Gabriel Bercu, New refinements of Becker-Stark and Cusa-Huygens inequalities via trigonometric polynomials method, 2021, 115, 1578-7303, 10.1007/s13398-021-01030-6
    2. Ling Zhu, Wilker inequalities of exponential type for circular functions, 2021, 115, 1578-7303, 10.1007/s13398-020-00973-6
    3. Yogesh J. Bagul, Christophe Chesneau, Marko Kostić, On the Cusa–Huygens inequality, 2021, 115, 1578-7303, 10.1007/s13398-020-00978-1
    4. Lina Zhang, Xuesi Ma, Dimitri Mugnai, Some New Results of Mitrinović–Cusa’s and Related Inequalities Based on the Interpolation and Approximation Method, 2021, 2021, 2314-4785, 1, 10.1155/2021/5595650
    5. Yogesh J. Bagul, Bojan Banjac, Christophe Chesneau, Marko Kostić, Branko Malešević, New Refinements of Cusa-Huygens Inequality, 2021, 76, 1422-6383, 10.1007/s00025-021-01392-8
    6. Ling Zhu, High Precision Wilker-Type Inequality of Fractional Powers, 2021, 9, 2227-7390, 1476, 10.3390/math9131476
    7. Wei-Dong Jiang, New sharp inequalities of Mitrinovic-Adamovic type, 2023, 17, 1452-8630, 76, 10.2298/AADM210507010J
    8. Yogesh J. Bagul, Christophe Chesneau, Sharp Extensions of a Cusa-Huygens Type Inequality, 2024, 1829-1163, 1, 10.52737/18291163-2024.16.14-1-12
  • Reader Comments
  • © 2024 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1189) PDF downloads(59) Cited by(1)

Figures and Tables

Figures(8)  /  Tables(8)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog