Research article Special Issues

A two-stage intrusion detection method based on light gradient boosting machine and autoencoder


  • Intrusion detection systems can detect potential attacks and raise alerts on time. However, dimensionality curses and zero-day attacks pose challenges to intrusion detection systems. From a data perspective, the dimensionality curse leads to the low efficiency of intrusion detection systems. From the attack perspective, the increasing number of zero-day attacks overwhelms the intrusion detection system. To address these problems, this paper proposes a novel detection framework based on light gradient boosting machine (LightGBM) and autoencoder. The recursive feature elimination (RFE) method is first used for dimensionality reduction in this framework. Then a focal loss (FL) function is introduced into the LightGBM classifier to boost the learning of difficult samples. Finally, a two-stage prediction step with LightGBM and autoencoder is performed. In the first stage, pre-decision is conducted with LightGBM. In the second stage, a residual is used to make a secondary decision for samples with a normal class. The experiments were performed on the NSL-KDD and UNSWNB15 datasets, and compared with the classical method. It was found that the proposed method is superior to other methods and reduces the time overhead. In addition, the existing advanced methods were also compared in this study, and the results show that the proposed method is above 90% for accuracy, recall, and F1 score on both datasets. It is further concluded that our method is valid when compared with other advanced techniques.

    Citation: Hao Zhang, Lina Ge, Guifen Zhang, Jingwei Fan, Denghui Li, Chenyang Xu. A two-stage intrusion detection method based on light gradient boosting machine and autoencoder[J]. Mathematical Biosciences and Engineering, 2023, 20(4): 6966-6992. doi: 10.3934/mbe.2023301

    Related Papers:

    [1] Tolga Zaman, Cem Kadilar . Exponential ratio and product type estimators of the mean in stratified two-phase sampling. AIMS Mathematics, 2021, 6(5): 4265-4279. doi: 10.3934/math.2021252
    [2] Khazan Sher, Muhammad Ameeq, Sidra Naz, Basem A. Alkhaleel, Muhammad Muneeb Hassan, Olayan Albalawi . Developing and evaluating efficient estimators for finite population mean in two-phase sampling. AIMS Mathematics, 2025, 10(4): 8907-8925. doi: 10.3934/math.2025408
    [3] Sohaib Ahmad, Sardar Hussain, Muhammad Aamir, Faridoon Khan, Mohammed N Alshahrani, Mohammed Alqawba . Estimation of finite population mean using dual auxiliary variable for non-response using simple random sampling. AIMS Mathematics, 2022, 7(3): 4592-4613. doi: 10.3934/math.2022256
    [4] Yasir Hassan, Muhammad Ismai, Will Murray, Muhammad Qaiser Shahbaz . Efficient estimation combining exponential and ln functions under two phase sampling. AIMS Mathematics, 2020, 5(6): 7605-7623. doi: 10.3934/math.2020486
    [5] Saman Hanif Shahbaz, Aisha Fayomi, Muhammad Qaiser Shahbaz . Estimation of the general population parameter in single- and two-phase sampling. AIMS Mathematics, 2023, 8(7): 14951-14977. doi: 10.3934/math.2023763
    [6] Amber Yousaf Dar, Nadia Saeed, Moustafa Omar Ahmed Abu-Shawiesh, Saman Hanif Shahbaz, Muhammad Qaiser Shahbaz . A new class of ratio type estimators in single- and two-phase sampling. AIMS Mathematics, 2022, 7(8): 14208-14226. doi: 10.3934/math.2022783
    [7] Sanaa Al-Marzouki, Christophe Chesneau, Sohail Akhtar, Jamal Abdul Nasir, Sohaib Ahmad, Sardar Hussain, Farrukh Jamal, Mohammed Elgarhy, M. El-Morshedy . Estimation of finite population mean under PPS in presence of maximum and minimum values. AIMS Mathematics, 2021, 6(5): 5397-5409. doi: 10.3934/math.2021318
    [8] Sohaib Ahmad, Sardar Hussain, Javid Shabbir, Muhammad Aamir, M. El-Morshedy, Zubair Ahmad, Sharifah Alrajhi . Improved generalized class of estimators in estimating the finite population mean using two auxiliary variables under two-stage sampling. AIMS Mathematics, 2022, 7(6): 10609-10624. doi: 10.3934/math.2022592
    [9] Khazan Sher, Muhammad Ameeq, Muhammad Muneeb Hassan, Basem A. Alkhaleel, Sidra Naz, Olyan Albalawi . Novel efficient estimators of finite population mean in stratified random sampling with application. AIMS Mathematics, 2025, 10(3): 5495-5531. doi: 10.3934/math.2025254
    [10] Hleil Alrweili, Fatimah A. Almulhim . Estimation of the finite population mean using extreme values and ranks of the auxiliary variable in two-phase sampling. AIMS Mathematics, 2025, 10(4): 8794-8817. doi: 10.3934/math.2025403
  • Intrusion detection systems can detect potential attacks and raise alerts on time. However, dimensionality curses and zero-day attacks pose challenges to intrusion detection systems. From a data perspective, the dimensionality curse leads to the low efficiency of intrusion detection systems. From the attack perspective, the increasing number of zero-day attacks overwhelms the intrusion detection system. To address these problems, this paper proposes a novel detection framework based on light gradient boosting machine (LightGBM) and autoencoder. The recursive feature elimination (RFE) method is first used for dimensionality reduction in this framework. Then a focal loss (FL) function is introduced into the LightGBM classifier to boost the learning of difficult samples. Finally, a two-stage prediction step with LightGBM and autoencoder is performed. In the first stage, pre-decision is conducted with LightGBM. In the second stage, a residual is used to make a secondary decision for samples with a normal class. The experiments were performed on the NSL-KDD and UNSWNB15 datasets, and compared with the classical method. It was found that the proposed method is superior to other methods and reduces the time overhead. In addition, the existing advanced methods were also compared in this study, and the results show that the proposed method is above 90% for accuracy, recall, and F1 score on both datasets. It is further concluded that our method is valid when compared with other advanced techniques.



    It is a well-known fact, that at large scale survey sampling, the use of several auxiliary variables improve the precision of the estimators. In survey sampling, researchers have already attempted to obtain the estimates for population parameter such as mean, median etc, that posses maximum statistical properties. For that purpose a representative part of population is needed, when population of interest is homogeneous then one can use simple random sampling (SRS) for selecting units. In some situations, information available in the form of attributes, which is positively correlated with study variables. Several authors including Naik and Gupta [1], Jhajj [2], Abd-Elfattah [3], Koyuncu [4], Solanki [5], Sharma [6] and Malik [7] proposed a set of estimators, taking the advantages of bi-serial correlation between auxiliary and study variables, utilizing information on single auxiliary attribute. Verma [8], Malik [7], Solanki et al., [9] and Sharma [10] suggested some estimators utilizing information on two auxiliary attributes in SRS, Mahdizadeh and Zamanzade [11] developed a kernel-based estimation of P(X>Y) in ranked-set sampling, SinghPal and Solanki [12] developed a new class of estimators of finite population mean survey sampling and Mahdizadeh and Zamanzade [13] suggest a smooth estimation of a reliability function in ranked set sampling, further more Hussain et al., [14] and Al-Marzouki et al., [15] also work in this side.

    In this article, we consider the problem of estimating the finite population mean using the auxiliary proportion under simple and two phase sampling scheme. The mathematical expression of the bias and mean squared error of the proposed estimator are derived under first order of approximation. The performance of proposed class of estimator is compared with that of the existing estimators both theoretically and numerically. In terms of percentage relative efficiency (PRE), it is found that proposed class of estimator outperforms the existing ones.

    Let U={u1,u2,...,uN} represent a finite population of size N distinct units, assumed that a sample of size n units is drawn from this population U using simple random sampling without replacement. Let yi and ϕij (i = 1, 2) denotes the observations on variable y and ϕi (i = 1, 2) for the jth unit (j = 1, 2, ..., N).

    ϕij=1,   if ith unit posses atrributes

    ϕij=0,   otherwise

    Pj=Niϕij=Aj/N,(j=1,2) and pj=Niϕij=aj/n,(j=1,2) are the population and sample proportions of auxiliary variable respectively. Let ˉY=Ni=1yiN, ˉy=ni=1yin be the population and sample mean of the study variable y. S2ϕjy=Ni=1(ϕijPj)(yi¯Y)N1,(j=1,2) are the variations between the study and the auxiliary attributes. S2ϕ1ϕ2=Ni=1(ϕi1P1)(ϕi2P2)N1 are the variations between the auxiliary attributes. ρyϕj=SϕjySySϕ represents the point bi-serial correlation between the study variable y and the two auxiliary attributes p1 and p2 respectively. ρϕ1ϕ2=Sϕ1ϕ2Sϕ1Sϕ2 represents the point bi-serial correlation between the two auxiliary attributes p1 and p2 respectively.

    Let us define, e0=ˉyˉYˉY, e1=p1P1P1, e2=p2P2P2,

    such that, E(ei)=0 (i=0,1,2),

    E(e20)=fC2y=V200, E(e21)=fC2ϕ21=V020, E(e22)=fC2ϕ22=V002,

    E(e0e1)=fρyϕ1CyCϕ1=V110, E(e0e2)=fρyϕ2CyCϕ2=V101,

    E(e1e2)=fρϕ1ϕ2Cϕ1Cϕ2=V011.

    Where Cy=SyˉY, Cϕj=SϕjPj,(j=1,2), is the co-efficient of variation of the study and auxiliary attribute. S2y=Ni=1(yiˉY)2N1, S2ϕj=Ni=1(ϕijPj)2N1,(j=1,2), is the variance of study and auxiliary attribute. f=(1n1N) is the correction factor.

    The rest of the paper is organized as follows. In Sections 1.1 and 1.2, introduction and notations are given for simple random sampling and two phase sampling. In Sections 2.1 and 2.3, we discussed some existing estimators of the finite population mean for both sampling designs. The proposed estimators are given in Sections 2.2 and 2.4. In Sections 3.1 and 3.2, theoretical comparisons are conducted. While in Sections 4.1 and 4.2 we focus on empirical studies. Finally, application and conclusions are drawn in Sections 5 and 6.

    The precision of estimate can be increased by using two methodologies. Firstly the precision may be increased by using using adequate sampling design for the estimated variable. Secondly the precision may be increased by using an appropriate estimation procedure, i.e. some auxiliary information which is closely associated with the variable under study. In application there exist a situation when complete auxiliary information or attribute is not available or information on that attribute is expensive. In that case, a method of two phase sampling or double sampling is used to obtain the estimates of unknown population parameters. In two phase sampling, a large preliminary sample (n) is selected by SRSWOR to obtain the estimate of unknown parameter of the auxiliary variable at first phase and the information on the auxiliary variable is collected, which is use to estimate the unknown auxiliary variable. Then a sub sample (n<n) is selected at second phase and both the study and auxiliary variables are collected. Here we assume that Population proportion (P1) is unknown and introduce an improved estimator to estimate the population mean. Kiregyera [16], Mohanty [17], Malik [7] and Haq [18] used two auxiliary variables in two phase sampling for the better estimation of mean.

    An example in this context is while estimating the yield of a crop, it is likely that the area under the crop may be unknown but the area of each farm may be known. Then y, P1 and P2 respectively are the yield area under the crop and area under cultivation.

    Consider a finite population U=(u1,u2...uN) of size N and let yi, ϕi1 and ϕi2 is the information on the study variable and two auxiliary attributes associated with each unit ui(i=1,2,...,N) of the population such that:

    ϕij=1, if the ith unit in the population possesses auxiliary attribute ϕj, ϕij=0 otherwise.

    We assume that the population mean of the first auxiliary proportion P1 is unknown but the same information is known for the second proportion. Let pj=niϕijn=aj/n for j=1,2 be the estimate of Pj obtained from the first phase sample of size n, drawn by using SRSWOR from the population of N units. Let ˉy=niyi and p1=niϕi1n=a1/n be the estimates of ˉY and P1 respectively, obtained from a second sample of size n, drawn from the first phase n using SRSWOR.

    To obtain the bias and MSE for estimators in two phase sampling we define the error terms as follows:

    e0=ˉyˉYˉY, e1=p1P1P1, e2=p2P2P2, e1=p1P1P1.

    such that:

    E(e0)=E(e1)=E(e2)=E(e1)=0,

    E(e20)=fC2y=V200,    E(e21)=fC2ϕ1=V020  E(e22)=fC2ϕ2=V002,

    E(e0e1)=fρyϕ1CyCϕ1=V110,    E(e0e2)=fρyϕ2CyCϕ2=V101,

    E(e1e2)=fρϕ1ϕ2Cϕ1Cϕ2=V011, E(e1e0)=fρyϕ1CyCϕ1=V110,

    E(e0e2)=fρyϕ2CyCϕ2=V101, E(e21)=fC2ϕ1=V020,

    E(e22)=fC2ϕ22=V002, f=1n1N, f=1n1N,

    ˉY=NNi=1yi,    ˉy=nni=1yi,

    S2y=Ni=1(yiˉY)2N1,    S2yϕj=Ni=1(ϕijPj)(yi¯Y)N1,

    S2ϕ1ϕ2=Ni=1(ϕi1P1)(ϕi2P2)N1,    Cy=SyˉY,

    Cϕj=SϕjPj, S2ϕj=Ni=1(ϕijPj)2N1,

    s2ϕj=ni=1(ϕijpj)2n1, represents the sample variance of size n,

    s2ϕj=ni=1(ϕijpj)2n1, represents the sample variance of size n

    ρyϕj=SϕjySySϕ represent point bi-serial correlation between the study variable (y) and the two auxiliary attributes (P1) and (p2).

    ρϕ1ϕ2=Sϕ1ϕ2Sϕ1Sϕ2 represent point bi-serial correlation between the two auxiliary attributes (P1) and (P2) respectively.

    In order to have an estimate of the study variable, using information of population proportion P, Naik [1] proposed the following estimators respectively.

    tU=ˉy. (2.1)

    The MSE of tU is given by

    MSE(tU)=ˉY2V200. (2.2)

    Naik [1] the following estimator respectively

    tA=ˉy(P1p1), (2.3)
    tB=ˉy(p2P2), (2.4)
    tC=ˉyexp(P1p1P1+p1), (2.5)
    tD=ˉyexp(p2P2p2+P2). (2.6)

    The MSE expressions of the estimators tA, tB, tC and tD are respectively given as

    MSE(tA)ˉY2(V2002V110+V020), (2.7)
    MSE(tB)ˉY2(V200+2V101+V002), (2.8)
    MSE(tC)ˉY2(V200V110+14V020), (2.9)
    MSE(tD)ˉY2(V200+V101+14V002). (2.10)

    Malik [7] proposed exponential type estimator as

    tMS=ˉyexp(P1p1P1+p1)γ1exp(P2p2P2+p2)γ2+b1(P1p1)+b2(P2p2), (2.11)

    where b1=syϕ1s2ϕ1 and b2=syϕ2s2ϕ2 are the sample regression coefficients. γ1 and γ2 are two unknown constants. The optimum values of these constants are given as:

    γ1(opt)=2{P1β1Cϕ1(1+ρ2ϕ1ϕ2)+ˉYCy(ρyϕ1+ρϕ1ϕ2ρyϕ2)}ˉYCϕ1(1+ρ2ϕ1ϕ2),
    γ2(opt)=2{P1β2Cϕ1(1+ρ2ϕ1ϕ2)+ˉYCy(ρyϕ2+ρϕ1ϕ2ρyϕ1)}ˉYCϕ2(1+ρ2ϕ1ϕ2).

    where β1=Syϕ1S2ϕ1 and β2=Syϕ2S2ϕ2, are the regression coefficients. The minimum mean squared error for the optimum values of γ1 and γ2 are given as:

    MSE(tMSmin)fˉY2C2y(1R2yϕ1ϕ2), (2.12)

    where R=ρ2ϕ1y+ρ2ϕ2y2ρϕ1yρϕ2yρϕ1ϕ21ρ2ϕ1ϕ2 is the multiple correlation of y on ϕ1 and ϕ2.

    We used some formulas for readers to easily understand and pick-out the difficulty of long equations.

    We proposed generalized class of estimators for estimating mean in simple random sampling using two auxiliary attributes, as

    tRPR=k1ˉyk2(p1P1)[α{2exp(η(p2P2)η(p2+P2)+2λ)}+(1α)exp(η(P2p2)η(P2+p2)+2λ)], (2.13)

    where k1 and k2 are suitable constants whose values are to be determined such that MSE of tRPR is minimum; η and λ are either real numbers or functions of known parameters of the auxiliary attribute ϕ2 such as coefficient of variation (Cϕ2), coefficient of kurtosis (βϕ2) and α is the scalar (0α1) for designing different estimators. Let ˉY and (P1,P2) be the population means of the study variable and auxiliary proportions respectively. ˉy and (p1,p2) be the sample means of the study variable and auxiliary proportions respectively.

    Putting α=1 and α=0 in (2.13), we get the following estimators.

    For α=1, the suggested class of estimators reduces to:

    tRPR(α=1)=k1ˉyk2(p1P1)[2exp{η(p2P2)η(p2P2)+2λ}].

    For α=0, the suggested class of estimators reduces to

    tRPR(α=0)=k1ˉyk2(p1P1)[exp{η(P2p2)η(P2p2)+2λ}].

    A set of of new estimators generated from Eq (2.13) using suitable use of α, η and λ are listed in Table 1.

    Table 1.  Set of estimators generated from estimator tRPR.
    Subset of proposed estimator α η λ
    tRPR1=k1ˉyk2(p1P1)[exp{Cϕ2(P2p2)Cϕ2(P2p2)+2β2ϕ2}] 0 Cϕ2 β2ϕ2
    tRPR2=k1ˉyk2(p1P1)[exp{P2(P2p2)P2(P2p2)+2}] 0 P2 1
    tRPR3=k1ˉyk2(p1P1)[exp{(P2p2)(P2p2)+2Cϕ2}] 0 1 Cϕ2
    tRPR4=k1ˉyk2(p1P1)[exp{(P2p2)(P2p2)+2}] 0 1 1
    tRPR5=k1ˉyk2(p1P1)[2exp{Cϕ2(p2P2)Cϕ2(p2P2)+2β2ϕ2}] 1 Cϕ2 β2ϕ2
    tRPR6=k1ˉyk2(p1P1)[2exp{P2(p2P2)P2(p2P2)+2}] 1 P2 1
    tRPR7=k1ˉyk2(p1P1)[2exp{(p2P2)(p2P2)+2Cϕ2}] 1 1 Cϕ2
    tRPR8=k1ˉyk2(p1P1)[2exp{(p2P2)(p2P2)+2}] 1 1 1

     | Show Table
    DownLoad: CSV

    Expressing Eq (2.13) in terms of e's we have

    tRPR=k1ˉY(1+e0)k2P1e1[α{2(1γe212γ2e22)}+(1α)(1γe2+32γ2e22)], (2.14)

    where γ=ηP22(ηP2+λ).

    To the first degree of approximation, we have:

    tRPRˉYk1ˉY+k1ˉYe0k2P1e1k1γˉYe2γˉYk1e2e0+ˉYk1e22γ232αˉYk1e22γ2+γk2P1e1e2ˉY. (2.15)

    Taking expectation of the above equation we get bias of tRPR, given by:

    Bias(tRPR)k1ˉYγˉYk1V101+ˉYk1V002γ2(32α)+γk2P1V011. (2.16)

    Squaring both sides of Eq (2.15) and taking expectations of both sides, we get the MSE of the estimator tRPR to the first order of approximation, as

    E(tRPRˉY)2ˉY2+ˉY2k21(14γV1012αV002γ2+4γ2V002+V200)
    k1ˉY2(22αV002γ22γV101+3V002γ2)
    +2k1k2ˉY(2γP1V011P1V110)
    2k2ˉY(γP1V011)+k22(P21V020), (2.17)
    MSEtRPRˉY2+ˉY2k21Ak1ˉY2B+2k1k2ˉYC2k2ˉYD+k22E. (2.18)

    where

    A=14γV1012αV002γ2+4γ2V002+V200,
    B=22αV002γ22γV101+3V002γ2,
    C=2γ2P1V011P1V110,D=γP1V011,E=P21V020.

    The optimum values of k1 and k2 are obtained by minimizing Eq (2.18) and is given by

    k1=BE2CD2(AEC2),

    and

    k2=ˉY(2ADBC)2(AEC2),

    Substituting the optimum values of k1 and k2 in Eq (2.18) we get the minimum MSE of tRPR as:

    MSE(t(RPR)min)=ˉY2(14AD2+B2E4BCD)4(AEC2). (2.19)

    The minimum MSE of the proposed estimator tRPR at Eq (2.19) depends upon many parametric constants, we use these constant for readers to easily understand and for notation convenient.

    The usual mean per unit estimator in two phase sampling is:

    tU=ˉy. (2.20)

    The MSE of tU is given by

    MSE(tU)=ˉY2V200. (2.21)

    The Naik [1] estimators in two phase sampling are :

    tA=ˉy(p1p1), (2.22)
    tB=ˉy(P2p2), (2.23)
    tC=ˉyexp(p1p1p1+p1), (2.24)
    tD=ˉyexp(P2p2P2+p2). (2.25)

    The MSE expressions of estimators tA, tB, tC and tD are respectively given as:

    MSE(tA)ˉY2(V200+V020V020+2V1102V110), (2.26)
    MSE(tB)ˉY2(V200+V002+2V101), (2.27)
    MSE(tC)ˉY2(V200+V110V11014V020+14V020), (2.28)
    MSE(tD)ˉY2(V200+14V002+V101). (2.29)

    Malik [7] used exponential type estimator with regression coefficients in two phase sampling which is given by:

    tMS=ˉyexp(p1p1p1+p1)δ1exp(P2p2P2+p2)δ2+b1(P1p1)+b2(P2p2), (2.30)

    where b1=syϕ1s2ϕ1 and b2=syϕ2s2ϕ2 are the sample regression coefficients. δ1 and δ2 are two unknown constants. The optimum values of these constants are given as:

    δ1(opt)=2P1β1ˉY+2Cyρyϕ1Cϕ1,
    δ2(opt)=2P2β2ˉY+2Cyρyϕ2Cϕ2,

    where, β1=Syϕ1S2ϕ1 and β2=Syϕ2S2ϕ2 are the regression coefficients.

    The minimum mean square error for the optimum values of δ1 and δ2 are given as:

    MSE(tMSmin)fˉY2C2y{f(1+ρ2yϕ1)+λ(ρ2yϕ1ρ2yϕ2)}. (2.31)

    We suggest a generalized exponential estimator when P1 is unknown and P2 is known:

    tRPR=k1ˉyk2(p1p1)[α{2exp(η(p2P2)η(p2+P2)+2λ)}+(1α)exp(η(P2p2)η(P2+p2)+2λ)]. (2.32)

    where k1 and k2 are suitable constants whose value are to be determined such that MSE of tRPR is minimum. η and λ are either real numbers or functions of known parameters of the auxiliary attribute ϕ2 such as coefficient of variation, coefficient of kurtosis (βϕ2) and α is a scalar (0α) for designing different estimators.

    Putting α=1 and α=0 in above suggested class of estimators, we get the following estimators.

    For α=1, the suggested class of estimators reduces to:

    tRPR(α=1)=k1ˉyk2(p1p1)[2exp{η(p2P2)η(p2P2)+2λ}].

    For α=0, the suggested class of estimators reduces to:

    tRPR(α=0)=k1ˉyk2(p1p1)[exp{η(P2p2)η(P2p2)+2λ}].

    Expressing (2.32) in terms of errors we have,

    tRPR=k1ˉY(1+e0)k2P1e1+k2P1e1[α{2(1+γe212γ2e22)}+(1α)(1γe2+32γ2e22)], (2.33)

    where γ=ηP22(ηP2+λ).

    To the first degree of approximation,

    tRPRˉYk1ˉY+k1ˉYe0k2P1e1+k2P1e1k1γˉYe2γˉYk1e2e0
    +ˉYk1e22γ232αˉYk1e22γ2+γk2P1e1e2γk2P1e1e2ˉY. (2.34)

    Taking expectation both sides of Eq (2.34) we have:

    Bias(tRPR)ˉY(k11)γˉYk1V101+ˉYk1V002γ2(32α). (2.35)

    Squaring Eq (2.34) and neglecting higher powers, we get

    E(tRPRˉY2)ˉY2+k21ˉY2(14γV1012αV002γ2+4γ2V002+V200)
    +k1ˉY2(2+2αV002γ2+2γV101+3V002γ2)
    +2k1k2ˉY(P1V110P1V110)+2k22(P21V020P21V020).
    MSE(tRPR)ˉY2+k21ˉY2A+k1ˉY2B+2k1k2ˉYC+k22D, (2.36)
    A=14γV1012αV002γ2+4γ2V002+V200,
    B=k12+2αV002γ2+2γV101+3V002γ2,
    C=P1V110P1V110,D=P21V020P21V020.

    The optimum values of k1, k2 are obtained by minimizing Eq (2.36):

    k1=DB(ADC2),
    k2=ˉY(BC)2(ADC2).

    Substituting the optimum values of k1 and k2 in Eq (2.36) we get the minimum MSE of tRPR as:

    MSE(t(RPR)min)ˉY2(1B2D)4(ADC2). (2.37)

    In this section we compare theoretically the minimum MSE of the proposed parent family of estimators tRPR with the MSE of existing estimators.

    Comparison with usual mean per unit estimator:

    (i) MSE(tU)MSE(t(RPRi)min)0(i=1,2,...,8), if ˉY2V200[ˉY2(14AD2+B2E4BCD)4(AEC2)]0,

    Comparison with Naik [1] estimators:

    (ii) MSE(tA)MSE(t(RPRi)min)0 (i=1,2,...,8), if ˉY2(V2002V110+V020)[ˉY2(14AD2+B2E4BCD)4(AEC2)]0.

    (iii) MSE(tB)MSE(t(RPRi)min)0 (i=1,2,...,8), if ˉY2(V200+2V101+V002)[ˉY2(14AD2+B2E4BCD)4(AEC2)]0.

    (iv) MSE(tC)MSE(t(RPRi)min)0 (i=1,2,...,8), if ˉY2(V200V110+14V020)[ˉY2(14AD2+B2E4BCD)4(AEC2)]0.

    (v) MSE(tD)MSE(t(RPRi)min)0 (i=1,2,...,8), if ˉY2(V200+V101+14V002)ˉY2(14AD2+B2E4BCD)4(AEC2)0.

    (vi) MSE(tMS)MSE(t(RPRi)min)0 (i=1,2,...,8), if fˉY2C2y(1R2yϕ1ϕ2)ˉY2(14AD2+B2E4BCD)4(AEC2)0.

    We observed that the proposed estimators perform better than the existing estimators if above condition (i)–(vi) are satisfied.

    In this section we compare theoretically the minimum MSE of the proposed parent family of estimators tRPR with the MSE of existing estimators.

    Comparison with usual mean per unit estimator:

    (i) MSE(tU)MSE(t(RPRi)min)0(i=1,2,..,8), if ˉY2(V200)ˉY2(1B2D)4(ADC2)0.

    Comparison with Naik [1] estimator:

    (ii) MSE(tA)MSE(t(RPRi)min)0 (i=1,2,...,8), if ˉY2(V200+V020V020+2V1102V110)ˉY2(1B2D)4(ADC2)0.

    (iii) MSE(tB)MSE(t(RPRi)min) 0 (i=1,2,...,8), if ˉY2(V200+V002+2V101)ˉY2(1B2D)4(ADC2)0.

    (iv) MSE(tC)MSE(t(RPRi)min)0 (i=1,2,...,8), if ˉY2(V200+V110V11014V020+14V020)ˉY2(1B2D)4(ADC2)0.

    (v) MSE(tD)MSE(t(RPRi)min)0 (i=1,2,...,8), if ˉY2(V200+14V002+V101)ˉY2(1B2D)4(ADC2)0.

    (vi) MSE(tMS)MSE(t(RPRi)min)0 (i=1,2,...,8), if fˉY2C2y{f(1+ρ2yϕ1)+λ(ρ2yϕ1ρ2yϕ2)}ˉY2(1B2D)4(ADC2)0.

    We observed that the proposed estimators perform better than the existing estimators if above condition (i)–(vi) are satisfied.

    Population 1. [19]

    Let Y be the study variable of the cultivated area of wheat in 1964.

    P1 be the proportion of cultivated area of wheat greater than 100 acre in 1963.

    P2 be the proportion of cultivated area of wheat greater than 500 in 1961.

    N=34, n=15, ˉY=199.4412, P1=0.73529, P2=0.647059, Sy=150.215, Sϕ1=0.4478111, Sϕ2=0.4850713, β2ϕ2=1.688, Cϕ1=0.6090231, ρϕ1ϕ2=0.6729, Cϕ2=0.7496556, Cy=0.7531,

    ρyϕ2=0.6281, ρyϕ1=0.559.

    Population 2. [20]

    Let Y be the study variable of the number of fishes caught in 1995.

    P1 be the proportion of fishes caught which is greater than 1000 in 1993.

    P2 be the proportion of fishes caught which is greater than 2000 in 1994.

    N=69, n=14, ˉY=4514.89, P1=0.7391304, P2=0.5507246, Sy=6099.14, Sϕ1=0.4423259, Sϕ2=0.5010645, β2ϕ2=2.015, Cϕ1=0.5984409, ρϕ1ϕ2=0.6577519, Cϕ2=0.9098277, Cy=1.350, ρyϕ2=0.538047, ρyϕ1=0.3966081.

    Population 3. [21]

    Let study variable Y be the tobacco area production in hectares during the year 2009.

    P1 be the proportion of farms with tobacco cultivation area greater than 500 hectares during the year 2007.

    P2 be proportion of farms with tobacco cultivation area greater than 800 hectares during the year 2008 for 47 districts of Pakistan.

    N=47, n=10, ˉY=1004.447, P1=0.4255319, P2=0.3829787, sy=2351.656, sϕ1=0.499, sϕ2=0.4850713, β2ϕ2=1.8324, Cϕ1=1.174456, ρϕ1ϕ2=0.9153857, Cϕ2=1.283018, Cy=2.341245, ρyϕ2=0.4661508, ρyϕ1=0.4395989.

    Population 4. [21]

    Let study variable Y be the cotton production in hectares during the year 2009.

    P1 be the proportion of farms with cotton cultivation area greater than 37 hectares during the year 2007.

    P2 be proportion of farms with cotton cultivation area greater than 35 hectares during the year 200 for 52 districts of Pakistan.

    N=52, n=11, ˉY=50.03846, P1=0.3846154, P2=0.4423077, Sy=71.13086, Sϕ1=0.4912508, Sϕ2=0.501506, β2ϕ2=1.62014, Cϕ1=1.277252, ρϕ1ϕ2=0.8877181, Cϕ2=1.13384, Cy=1.421524, ρyϕ2=0.6935718, ρyϕ1=0.7369579.

    We use the following expression to obtain the Percentage Relative Efficiency PRE:

    PRE=MSE(t0)MSE(timin)100, (4.1)

    where i = U, A, B, C, D, MS, RPR1, RPR2, RPR3, RPR4, RPR5, RPR6, RPR7 and RPR8.

    In Table 2, it is clearly shown that our suggested class of estimator tRPRi perform better than all the existing estimators tA, tB, tC, tD and tMS. A significant increase is observed in the percentage relative efficiency of estimators of tRPR6, tRPR7 and tRPR8.

    Table 2.  Set of estimators generated from estimator tRPR(α=1).
    Subset of proposed estimator α η λ
    tRPR1=k1ˉyk2(p1p1)[exp{Cϕ2(P2p2)Cϕ2(P2p2)+2β2ϕ2}] 0 Cϕ2 β2ϕ2
    tRPR2=k1ˉyk2(p1p1)[exp{P2(P2p2)P2(P2p2)+2}] 0 P2 1
    tRPR3=k1ˉyk2(p1p1)[exp{(P2p2)(P2p2)+2Cϕ2}] 0 1 Cϕ2
    tRPR4=k1ˉyk2(p1p1)[exp{(P2p2)(P2p2)+2}] 0 1 1
    tRPR5=k1ˉyk2(p1p1)[2exp{Cϕ2(p2P2)Cϕ2(p2P2)+2β2ϕ2}] 1 Cϕ2 β2ϕ2
    tRPR6=k1ˉyk2(p1p1)[2exp{P2(p2P2)P2(p2P2)+2}] 1 P2 1
    tRPR7=k1ˉyk2(p1p1)[2exp{(p2P2)(p2P2)+2Cϕ2}] 1 1 Cϕ2
    tRPR8=k1ˉyk2(p1p1)[2exp{(p2P2)(p2P2)+2}] 1 1 1

     | Show Table
    DownLoad: CSV

    Population 1. [19]

    Let Y be the study variable cultivated area of wheat in 1964.

    P1 be the proportion of cultivated area of wheat greater than 100 acres in 1963.

    P2 be the proportion of cultivated area of wheat greater than 500 in 1961.

    N=34, n=15, n=3, ˉY=199.4412, P1=0.73529, P2=0.647059, Sy=150.215, Sϕ1=0.4478111, Sϕ2=0.4850713, β2ϕ2=1.688, Cϕ1=0.6090231, ρϕ1ϕ2=0.6729, Cϕ2=0.7496556, Cy=0.7531, ρyϕ2=0.6281, ρyϕ1=0.559.

    Population 2. [20]

    Let Y be the study variable, number of fishes caught in 1995.

    P1 be the proportion of fishes caught greater than 1000 in 1993.

    P2 be the proportion of fishes caught greater than 2000 in 1994.

    N=69, n=20, n=7, ˉY=4514.89, P1=0.7391304, P2=0.5507246, sy=6099.14, sϕ1=0.4423259, sϕ2=0.5010645, β2ϕ2=2.015, Cϕ1=0.5984409, ρϕ1ϕ2=0.6577519, Cϕ2=0.9098277, Cy=1.350, ρyϕ2=0.538047, ρyϕ1=0.3966081.

    Population 3. [21]

    Let Y be the study variable, tobacco area production in hectares during the year 2009.

    P1 be the proportion of farms with tobacco cultivation area greater than 500 hectares during the year 2007.

    P2 be proportion of farms with tobacco cultivation area greater than 800 hectares during the year 2008 for 47 districts of Pakistan.

    N=47, n=15, n=7, ˉY=1004.447, P1=0.4255319, P2=0.3829787, Sy=2351.656, Sϕ1=0.49, Sϕ2=0.4850713, β2ϕ2=1.8324, Cϕ1=1.174456, ρϕ1ϕ2=0.9153857, Cϕ2=1.283018, Cy=2.341245, ρyϕ2=0.4661508, ρyϕ1=0.4395989.

    Population 4. [21]

    Let Y be the study variable, cotton production in hectares during the year 2009.

    P1 be the proportion of farms with cotton cultivation area greater than 37 hectares during the year 2007.

    P2 be proportion of farms with cotton cultivation area greater than 35 hectares during the year 2008 for 52 districts of Pakistan.

    N=52, n=11, n=3, ˉY=50.03846, P1=0.3846154, P2=0.4423077, Sy=71.13086, Sϕ1=0.4912508, Sϕ2=0.501506, β2ϕ2=1.62014, Cϕ1=1.277252, ρϕ1ϕ2=0.8877181, Cϕ2=1.13384, Cy=1.421524, ρyϕ2=0.6935718, ρyϕ1=0.7369579.

    We use the following expression to obtain the Percentage Relative Efficiency(PRE):

    PRE=MSE(t0)MSE(timin)100, (4.2)

    where i = U, A,B,C,D,MS,RPR1,RPR2,RPR3,RPR4,RPR5,RPR6,RPR7andRPR8.

    The results for data set 1–4 are given in Table 4.

    In Table 4, it is clearly shown that our suggested class of estimator tRPR perform better than all the existing estimators of tA, tB, tC and tD and tMS. A significant increase is observed in the percentage relative efficiency of estimators of tRPR6, tRPR7 and tRPR8.

    There are many situations where we only interest in knowing everything about the study variable, which is too difficult. For this we can use two auxiliary variables in the form of proportion to find out the study variable. This manuscript provides us the basic tools to the problems related to proportion estimation and two-phase sampling. Here we can see that in abstract of the manuscript we just talk about the minimum MSE of proposed and existing estimators, reason behind is that we can easily compare the minimum MSE with other properties of good estimators like MLE ect., we can also see that the comparison is made in the form of percentage relative efficiency.

    Statisticians are constantly trying to develop efficient estimators and estimation methodologies to increase the efficiency of estimates. The progress is going on for estimators of population mean. In the present paper our task is to develop a new estimator for estimating the finite population mean under two different sampling schemes, which are simple random sampling and two-phase sampling. The new estimators will be proposed under the following situations:

    1). The initial sample is collected through simple random sampling.

    2). And then by two-phase sampling using simple random sampling.

    In this article, we consider the problem of estimating the finite population mean using the auxiliary proportion under simple random sampling and two-phase sampling scheme. In general, during surveys, it is observed that information in most cases is not obtained on the first attempt even after some call-backs, in such types of issue we use simple random sampling. And when the required results are not obtained, we use two-phase sampling. These approaches are used to obtain the information as much as possible. In sample surveys, it is well known that while estimating the population parameters, i.e., Finite population (mean, median, quartiles, coefficient of variation and distribution function) the information of the auxiliary variable (Proportion) is usually used to improve the efficiency of the estimators. The main aim of studies is to find out more efficient estimators than classical and recent proposed estimators using the auxiliary information (in the form of proportion) for estimating finite population mean under simple random sampling and two-phase sampling scheme.

    There are situations where our work is deemed necessary and can be used in daily life.

    1). For a nutritionist, it is interesting to know the proportion of population that consumes 25% or more of the calorie intake from saturated fat.

    2). Similarly, a soil scientist may be interested in estimating the distribution of clay percent in the soil.

    3). In addition, policy-makers may be interested in knowing the proportion of people living in a developing country below the poverty line.

    In this paper, we have proposed a generalized class of exponential ratio type estimators for estimating population mean using the auxiliary information in the form of proportions under simple and two phase sampling. We used SRS to estimate the population mean using the proportions of available auxiliary information, and when the auxiliary information is unknown, we used two phase sampling for estimation resolution. From the numerical results available in Tables 3 and 4 we can see that two phase sampling gave more efficient results than simple random sampling. Thus the use of auxiliary information in estimation processes increases the efficiency of the estimator, that's we have used two auxiliary variables as attributes. In the numerical study we showed that the proposed estimator is more efficient that tU, tA, tB, tC, tD, tMS and any other suggested family of estimators both in simple and two phase sampling schemes.

    Table 3.  Percentage relative efficiency (PRE) with respect to usual mean estimator tU.
    Estimator Data set1 Data set 2 Data set 3 Data set 4
    tU 100 100 100 100
    tA 133.37 118.36 123.36 207.04
    tB 30.84 45.95 55.04 36.46
    tC 140.5 114.40 118.75 185.29
    tD 55.39 67.77 75.01 58.40
    tMS 139.06 110.94 105.64 146.93
    tRPR1 125.98 134.42 165.4 225.72
    tRPR2 106.66 134.57 167.49 235.43
    tRPR3 111.08 137.09 167.89 235.65
    tRPR4 109.10 137.39 167.82 233.82
    tRPR5 125.75 120.83 166.47 223.72
    tRPR6 161.80 137.02 167.63 235.16
    tRPR7 168.29 137.47 168.16 235.96
    tRPR8 165.09 134.42 168.01 235.93

     | Show Table
    DownLoad: CSV
    Table 4.  Percentage relative efficiency (PRE) with respect to usual mean estimator tU.
    Estimator Data set1 Data set 2 Data set 3 Data set 4
    tU 100 100 100 100
    tA 128.13 112.64 113.46 166.39
    tB 78.44 75.41 76.75 71.54
    tC 133.90 110.08 110.95 155.10
    tD 90.33 88.37 89.01 86.01
    tMS 134.80 111.05 133.6 209.34
    tRPR1 149.36 131.69 178.24 225.68
    tRPR2 158.43 138.50 181.06 239.47
    tRPR3 160.04 139.77 181.55 242.17
    tRPR4 159.63 139.58 181.78 242.08
    tRPR5 149.59 131.98 178.95 227.06
    tRPR6 158.56 138.59 181.13 239.69
    tRPR7 160.36 139.77 181.13 242.80
    tRPR8 159.40 139.99 182.10 242.70

     | Show Table
    DownLoad: CSV

    Some possible extensions of the current work are as follows:

    Develop improved finite population mean estimators,

    1). using supplementary information more than one auxiliary variable.

    2). under stratified two-phase sampling.

    3). in the presence of measurement errors.

    4). under non-response with two-phase sampling.

    The authors are thankful to the learned referee for his useful comments and suggestions.

    The authors declare no conflict of interest.



    [1] An Article to Understand Ransomware Attacks: Characteristics, Trends and Challenges. Available from: https://www.secrss.com/articles/33928
    [2] D. J. Du, M. G. Zhu, M. R. Fei, M. Fei, S. Bu, L. Wu, et al., A Review on cybersecurity analysis, attack detection, and attack defense methods in cyber-physical power systems, J. Mod. Power Syst. Clean Energy, 2022 (2022), 1–18. https://doi.org/10.35833/MPCE.2021.000604 doi: 10.35833/MPCE.2021.000604
    [3] Ransomware Attack Forces Shutdown of Largest Fuel Pipeline in the U.S. Available from: https://www.cnbc.com/2021/05/08/colonial-pipeline-shuts-pipeline-operations-after-cyberattack.html
    [4] P. R. Kanna, P. Santhi, Unified deep learning approach for efficient intrusion detection system using integrated spatial–temporal features, Knowl. Based Syst., 226 (2021), 107132. https://doi.org/10.1016/j.knosys.2021.107132 doi: 10.1016/j.knosys.2021.107132
    [5] M. Bijone, A survey on secure network: intrusion detection & prevention approaches, Am. J. Inf. Syst., 4 (2016), 69–88. https://doi.org/10.12691/ajis-4-3-2 doi: 10.12691/ajis-4-3-2
    [6] A. Khraisat, I. Gondal, P. Vamplew, J. Kamruzzaman, Survey of intrusion detection systems: techniques, datasets and challenges, Cybersecurity, 2 (2019), 1–22. https://doi.org/10.1186/s42400-019-0038-7 doi: 10.1186/s42400-018-0018-3
    [7] A. Thakkar, R. Lohiya, A review of the advancement in intrusion detection datasets, Procedia Comput. Sci., 167 (2020), 636–645. https://doi.org/10.1016/j.procs.2020.03.330 doi: 10.1016/j.procs.2020.03.330
    [8] C. Guo, Y. Ping, N. Liu, S. S. Luo, A two-level hybrid approach for intrusion detection, Neurocomputing, 214 (2016), 391–400. https://doi.org/10.1016/j.neucom.2016.06.021 doi: 10.1016/j.neucom.2016.06.021
    [9] Intrusion Detection System. Available from: https://blog.51cto.com/u_12632800/4810474
    [10] I. F. Kilincer, F. Ertam, A. Sengur, Machine learning methods for cyber security intrusion detection: Datasets and comparative study, Comput. Networks, 188 (2021), 107840. https://doi.org/10.1016/j.comnet.2021.107840 doi: 10.1016/j.comnet.2021.107840
    [11] X. Xue, Y. Jia, Y. Tang, Expressway project cost estimation with a convolutional neural network model, IEEE Access, 8 (2020), 217848–217866. https://doi.org/10.1109/ACCESS.2020.3042329 doi: 10.1109/ACCESS.2020.3042329
    [12] N. Sameera, M. Shashi, Encoding approach for intrusion detection using PCA and KNN classifier, in Proceedings of the Third International Conference on Computational Intelligence and Informatics, 1090 (2020), 187–199. https://doi.org/10.1007/978-981-15-1480-7_15
    [13] J. Kevric, J. Samed, S. Abdulhamit, An effective combining classifier approach using tree algorithms for network intrusion detection, Neural Comput. Appl., 28 (2017), 1051–1058. https://doi.org/10.1007/s00521-016-2418-1 doi: 10.1007/s00521-016-2418-1
    [14] M. Yousefnezhad, J. Hamidzadeh, M. Aliannejadi, Ensemble classification for intrusion detection via feature extraction based on deep Learning, Soft Comput., 25 (2021), 12667–12683. https://doi.org/10.1007/s00500-021-06067-8 doi: 10.1007/s00500-021-06067-8
    [15] R. Swami, M. Dave, V. Ranga, Voting-based intrusion detection framework for securing software-defined networks, Concurrency Comput. Pract. Exper., 32 (2020), e5927. https://doi.org/10.1002/cpe.5927 doi: 10.1002/cpe.5927
    [16] A. Basati, M. M. Faghih, PDAE: Efficient network intrusion detection in IoT using parallel deep auto-encoders, Inf. Sci., 598 (2022), 57–74. https://doi.org/10.1016/j.ins.2022.03.065 doi: 10.1016/j.ins.2022.03.065
    [17] A. S. Almogren, Intrusion detection in edge-of-things computing, J. Parallel Distrib. Comput., 137 (2020), 259–265. https://doi.org/10.1016/j.jpdc.2019.12.008 doi: 10.1016/j.jpdc.2019.12.008
    [18] M. S. ElSayed, N. Le-Khac, M. A. Albahar, A. Jurcut, A novel hybrid model for intrusion detection systems in SDNs based on CNN and a new regularization technique, J. Network Comput. Appl., 191 (2021), 1–18. https://doi.org/10.1016/j.jnca.2021.103160 doi: 10.1016/j.jnca.2021.103160
    [19] N. Chouhan, A. Khan, Network anomaly detection using channel boosted and residual learning based deep convolutional neural network, Appl. Soft Comput., 83 (2019), 1–18. https://doi.org/10.1016/j.asoc.2019.105612 doi: 10.1016/j.asoc.2019.105612
    [20] G. Andresini, A. Appice, N. D. Mauro, C. Loglisci, D. Malerba, Exploiting the auto-encoder residual error for intrusion detection, in 2019 IEEE European Symposium on Security and Privacy Workshops (EuroS & PW), (2019), 281–290. https://doi.org/10.1109/EuroSPW.2019.00038
    [21] R. C. Aygun, A. G. Yavuz, Network anomaly detection with stochastically improved autoencoder based models, in 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud), (2017), 192–198. https://doi.org/10.1109/CSCloud.2017.39
    [22] Y. Yang, K. Zheng, C. Wu, Y. Yang, Improving the classification effectiveness of intrusion detection by using improved conditional variational autoencoder and deep neural network, Sensors, 19 (2019), 2528. https://doi.org/10.3390/s19112528 doi: 10.3390/s19112528
    [23] B. Min, J. Yoo, S. Kim, D. Shin, Network anomaly detection using memory-augmented deep autoencoder, IEEE Access, 9 (2021), 104695–104706. https://doi.org/10.1109/ACCESS.2021.3100087 doi: 10.1109/ACCESS.2021.3100087
    [24] E. Mushtaq, A. Zameer, M. Umer, A. A. Abbas, A two-stage intrusion detection system with auto-encoder and LSTMs, Appl. Soft Comput., 121 (2022), 1–16. https://doi.org/10.1016/j.asoc.2022.108768 doi: 10.1016/j.asoc.2022.108768
    [25] M. Al-Qatf, Y. Lasheng, M. Al-Habib, K. Al-Sabahi, Deep learning approach combining sparse autoencoder with SVM for network intrusion detection, IEEE Access, 6 (2018), 52843–52856. https://doi.org/10.1109/ACCESS.2018.2869577 doi: 10.1109/ACCESS.2018.2869577
    [26] M. Belouch, S. E. Hadaj, M. Idhammad, A two-stage classifier approach using reptree algorithm for network intrusion detection, Int. J. Adv. Comput. Sci. Appl., 8 (2017), 389–394. https://doi.org/10.14569/IJACSA.2017.080651 doi: 10.14569/IJACSA.2017.080651
    [27] A. Javaid, W. Q. Sun, A. Y. Javaid, M. Alam, A deep learning approach for network intrusion detection system, in Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS), 3 (2016), 1–6. http://dx.doi.org/10.4108/eai.3-12-2015.2262516
    [28] L. X. Zhang, D. Ma, A hybrid approach toward efficient and accurate intrusion detection for in-vehicle networks, IEEE Access, 10 (2022), 10852–10866. http://dx.doi.org/10.1109/ACCESS.2022.3145007 doi: 10.1109/ACCESS.2022.3145007
    [29] J. Gu, L. H. Wang, H. W. Wang, S. S. Wang, A novel approach to intrusion detection using SVM ensemble with feature augmentation, Comput. Secur., 86 (2019), 53–62. https://doi.org/10.1016/j.cose.2019.05.022 doi: 10.1016/j.cose.2019.05.022
    [30] C. Ieracitano, A. Adeel, F. C. Morabito, A. Hussain, A novel statistical analysis and autoencoder driven intelligent intrusion detection approach, Neurocomputing, 387 (2020), 51–62. https://doi.org/10.1016/j.neucom.2019.11.016 doi: 10.1016/j.neucom.2019.11.016
    [31] H. Zhang, J. L. Li, X. M. Liu, C. Dong, Multi-dimensional feature fusion and stacking ensemble mechanism for network intrusion detection, Future Gener. Comput. Syst., 122 (2021), 130–143. https://doi.org/10.1016/j.future.2021.03.024 doi: 10.1016/j.future.2021.03.024
    [32] S. M. Kasongo, Y. X. Sun, Performance analysis of intrusion detection systems using a feature selection method on the UNSW-NB15 dataset, J. Big Data, 7 (2020), 1–20. https://doi.org/10.1186/s40537-020-00379-6 doi: 10.1186/s40537-019-0278-0
    [33] A. A. Megantara, T. Ahmad, A hybrid machine learning method for increasing the performance of network intrusion detection systems, J. Big Data, 8 (2021), 1–19. https://doi.org/10.1186/s40537-021-00531-w doi: 10.1186/s40537-020-00387-6
    [34] M. Rashid, J. Kamruzzaman, T. Imam, S. Wibowo, S. Gordon, A tree-based stacking ensemble technique with feature selection for network intrusion detection, Appl. Intell., 52 (2022), 1–14. https://doi.org/10.1007/s10489-021-02968-1 doi: 10.1007/s10489-021-02377-4
    [35] A. Chohra, P. Shirani, E. B. Karbab, M. Debbabi, Chameleon: Optimized feature selection using particle swarm optimization and ensemble methods for network anomaly detection, Comput. Secur., 117 (2022), 102684. https://doi.org/10.1016/j.cose.2022.102684 doi: 10.1016/j.cose.2022.102684
    [36] B. Y. Tama, M. Comuzzi, K. H. Rhee, TSE-IDS: A two-stage classifier ensemble for intelligent anomaly-based intrusion detection system, IEEE Access, 7 (2019), 94497–94507. https://doi.org/10.1109/ACCESS.2019.2928048 doi: 10.1109/ACCESS.2019.2928048
    [37] B. I. Seraphim, E. Poovammal, K. Ramana, N. Kryvinska, N. Penchalaiah, A hybrid network intrusion detection using darwinian particle swarm optimization and stacked autoencoder hoeffding tree, Math. Biosci. Eng., 18 (2021), 8024–8044. https://doi.org/10.3934/mbe.2021398 doi: 10.3934/mbe.2021398
    [38] S. Seo, S. Park, J. Kim, Improvement of network intrusion detection accuracy by using restricted Boltzmann machine, in 2016 8th International Conference on Computational Intelligence and Communication Networks (CICN), (2016), 413–417. https://doi.org/10.1109/CICN.2016.87
    [39] W. Li, G. Yin, X. Chen, Application of deep extreme learning machine in network intrusion detection systems, IAENG Int. J. Comput. Sci., 47 (2020), 136–143.
    [40] Z. R. Zhao, L. N. Ge, G. F. Zhang, A novel DBN-LSSVM ensemble method for intrusion detection system, in 2021 9th International Conference on Communications and Broadband Networking, (2021), 101–107. https://doi.org/10.1145/3456415.3456431
    [41] H. Zhang, L. N. Ge, Z. Wang, A high performance intrusion detection system using LightGBM based on oversampling and undersampling, in International Conference on Intelligent Computing, 13393 (2022), 638–652. https://doi.org/10.1007/978-3-031-13870-6_53
    [42] G. L. Ke, Q. Meng, T. Finley, T. F. Wang, W. Cheng, W. D. Ma, et al., Lightgbm: A highly efficient gradient boosting decision tree, Adv. Neural Inf. Process. Syst., 30 (2017).
    [43] T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, (2016), 785–794. https://doi.org/10.1145/2939672.2939785
    [44] K. Mo, J. Li, A deep auto-encoder based LightGBM approach for network intrusion detection system, in Proceedings of the International Conference on Advances in Computer Technology, Information Science and Communications, (2019), 142–147. http://doi.org/10.5220/0008098401420147
    [45] T. Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollar, Focal loss for dense object detection, in Proceedings of the IEEE International Conference on Computer Vision, (2017), 2980–2988.
    [46] Q. Liu, D. Wang, Y. Jia, S. Luo, C. Wang, A multi-task based deep learning approach for intrusion detection, Knowl. Based Syst., 238 (2022), 1–12. https://doi.org/10.1016/j.knosys.2021.107852 doi: 10.1016/j.knosys.2021.107852
    [47] N. Shone, T. N. Ngoc, V. D. Phai, Q. Shi, A deep learning approach to network intrusion detection, IEEE Trans. Emerging Top. Comput. Intell., 2 (2018), 41–50. https://doi.org/10.1109/TETCI.2017.2772792 doi: 10.1109/TETCI.2017.2772792
    [48] S. Naseer, Y. Saleem, S. Khalid, M. K. Bashir, J. Han, M. M. Iqbal, et al., Enhanced network anomaly detection based on deep neural networks, IEEE Access, 6 (2018), 48231–48246. https://doi.org/10.1109/ACCESS.2018.2863036 doi: 10.1109/ACCESS.2018.2863036
    [49] M. Tavallaee, E. Bagheri, W. Lu, A. A. Ghorbani, A detailed analysis of the KDD CUP 99 data set, in 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, (2009), 1–6. https://doi.org/10.1109/CISDA.2009.5356528
    [50] N. Moustafa, J. Slay, UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set), in 2015 Military Communications and Information Systems Conference (MilCIS), (2015), 1–6. https://doi.org/10.1109/MilCIS.2015.7348942
    [51] N. Moustafa, J. Slay, The evaluation of network anomaly detection systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set, Inf. Secur. J. Global Perspect., 25 (2016), 18–31. http://dx.doi.org/10.1080/19393555.2015.1125974 doi: 10.1080/19393555.2015.1125974
    [52] W. J. Lian, G. Q. Nie, B. Jia, D. D. Shi, Q. Fan, Y. Q. Liang, An intrusion detection method based on decision tree-recursive feature elimination in ensemble learning, Math. Prob. Eng., 2020 (2020). https://doi.org/10.1155/2020/2835023 doi: 10.1155/2020/2835023
    [53] LightGBM. Available from: https://lightgbm.readthedocs.io/
    [54] N. Moustafa, J. Slay, G. Creech, Novel geometric area analysis technique for anomaly detection using trapezoidal area estimation on large-scale networks, IEEE Trans. Big Data, 5 (2017), 481–494. https://doi.org/10.1109/TBDATA.2017.2715166 doi: 10.1109/TBDATA.2017.2715166
    [55] B. A. Tama, K. H. Rhee, An in-depth experimental study of anomaly detection using gradient boostedmachine, Neural Comput. Appl., 31 (2019), 955–965. https://doi.org/10.1007/s00521-017-3128-z doi: 10.1007/s00521-017-3128-z
  • This article has been cited by:

    1. Muhammad Ahmed Shehzad, Anam Nisar, Aamna Khan, Walid Emam, Yusra Tashkandy, Haris Khurram, Isra Al-Shbeil, Modified median quartile double ranked set sampling for estimation of population mean, 2024, 10, 24058440, e34627, 10.1016/j.heliyon.2024.e34627
    2. Muhammad Nadeem Intizar, Muhammad Ahmed Shehzad, Haris Khurram, Soofia Iftikhar, Aamna Khan, Abdul Rauf Kashif, Integrating endogeneity in survey sampling using instrumental-variable calibration estimator, 2024, 10, 24058440, e33969, 10.1016/j.heliyon.2024.e33969
    3. Anoop Kumar, Walid Emam, Yusra Tashkandy, Memory type general class of estimators for population variance under simple random sampling, 2024, 10, 24058440, e36090, 10.1016/j.heliyon.2024.e36090
    4. Jing Wang, Sohaib Ahmad, Muhammad Arslan, Showkat Ahmad Lone, A.H. Abd Ellah, Maha A. Aldahlan, Mohammed Elgarhy, Estimation of finite population mean using double sampling under probability proportional to size sampling in the presence of extreme values, 2023, 9, 24058440, e21418, 10.1016/j.heliyon.2023.e21418
    5. Muhammad Junaid, Sadaf Manzoor, Sardar Hussain, M.E. Bakr, Oluwafemi Samson Balogun, Shahab Rasheed, An optimal estimation approach in non-response under simple random sampling utilizing dual auxiliary variable for finite distribution function, 2024, 10, 24058440, e38343, 10.1016/j.heliyon.2024.e38343
    6. Khazan Sher, Muhammad Ameeq, Muhammad Muneeb Hassan, Olayan Albalawi, Ayesha Afzal, Development of improved estimators of finite population mean in simple random sampling with dual auxiliaries and its application to real world problems, 2024, 10, 24058440, e30991, 10.1016/j.heliyon.2024.e30991
    7. Abdullah Mohammed Alomair, Soofia Iftikhar, Calibrated EWMA estimators for time-scaled surveys with diverse applications, 2024, 10, 24058440, e31030, 10.1016/j.heliyon.2024.e31030
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(2474) PDF downloads(139) Cited by(6)

Figures and Tables

Figures(17)  /  Tables(7)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog