Processing math: 100%
Research article Special Issues

A discrete mixed distribution: Statistical and reliability properties with applications to model COVID-19 data in various countries


  • The aim of this paper is to introduce a discrete mixture model from the point of view of reliability and ordered statistics theoretically and practically for modeling extreme and outliers' observations. The base distribution can be expressed as a mixture of gamma and Lindley models. A wide range of the reported model structural properties are investigated. This includes the shape of the probability mass function, hazard rate function, reversed hazard rate function, min-max models, mean residual life, mean past life, moments, order statistics and L-moment statistics. These properties can be formulated as closed forms. It is found that the proposed model can be used effectively to evaluate over- and under-dispersed phenomena. Moreover, it can be applied to analyze asymmetric data under extreme and outliers' notes. To get the competent estimators for modeling observations, the maximum likelihood approach is utilized under conditions of the Newton-Raphson numerical technique. A simulation study is carried out to examine the bias and mean squared error of the estimators. Finally, the flexibility of the discrete mixture model is explained by discussing three COVID-19 data sets.

    Citation: Mohamed S. Eliwa, Buthaynah T. Alhumaidan, Raghad N. Alqefari. A discrete mixed distribution: Statistical and reliability properties with applications to model COVID-19 data in various countries[J]. Mathematical Biosciences and Engineering, 2023, 20(5): 7859-7881. doi: 10.3934/mbe.2023340

    Related Papers:

    [1] Rashad A. R. Bantan, Zubair Ahmad, Faridoon Khan, Mohammed Elgarhy, Zahra Almaspoor, G. G. Hamedani, Mahmoud El-Morshedy, Ahmed M. Gemeay . Predictive modeling of the COVID-19 data using a new version of the flexible Weibull model and machine learning techniques. Mathematical Biosciences and Engineering, 2023, 20(2): 2847-2873. doi: 10.3934/mbe.2023134
    [2] Mattia Zanella, Chiara Bardelli, Mara Azzi, Silvia Deandrea, Pietro Perotti, Santino Silva, Ennio Cadum, Silvia Figini, Giuseppe Toscani . Social contacts, epidemic spreading and health system. Mathematical modeling and applications to COVID-19 infection. Mathematical Biosciences and Engineering, 2021, 18(4): 3384-3403. doi: 10.3934/mbe.2021169
    [3] Ibrahim Alkhairy . Classical and Bayesian inference for the discrete Poisson Ramos-Louzada distribution with application to COVID-19 data. Mathematical Biosciences and Engineering, 2023, 20(8): 14061-14080. doi: 10.3934/mbe.2023628
    [4] Yinghui Zhou, Zubair Ahmad, Zahra Almaspoor, Faridoon Khan, Elsayed tag-Eldin, Zahoor Iqbal, Mahmoud El-Morshedy . On the implementation of a new version of the Weibull distribution and machine learning approach to model the COVID-19 data. Mathematical Biosciences and Engineering, 2023, 20(1): 337-364. doi: 10.3934/mbe.2023016
    [5] A. Q. Khan, M. Tasneem, M. B. Almatrafi . Discrete-time COVID-19 epidemic model with bifurcation and control. Mathematical Biosciences and Engineering, 2022, 19(2): 1944-1969. doi: 10.3934/mbe.2022092
    [6] Hamdy M. Youssef, Najat A. Alghamdi, Magdy A. Ezzat, Alaa A. El-Bary, Ahmed M. Shawky . A new dynamical modeling SEIR with global analysis applied to the real data of spreading COVID-19 in Saudi Arabia. Mathematical Biosciences and Engineering, 2020, 17(6): 7018-7044. doi: 10.3934/mbe.2020362
    [7] Qinghua Liu, Siyu Yuan, Xinsheng Wang . A SEIARQ model combine with Logistic to predict COVID-19 within small-world networks. Mathematical Biosciences and Engineering, 2023, 20(2): 4006-4017. doi: 10.3934/mbe.2023187
    [8] Ziqiang Cheng, Jin Wang . Modeling epidemic flow with fluid dynamics. Mathematical Biosciences and Engineering, 2022, 19(8): 8334-8360. doi: 10.3934/mbe.2022388
    [9] Salma M. Al-Tuwairqi, Sara K. Al-Harbi . Modeling the effect of random diagnoses on the spread of COVID-19 in Saudi Arabia. Mathematical Biosciences and Engineering, 2022, 19(10): 9792-9824. doi: 10.3934/mbe.2022456
    [10] Enahoro A. Iboi, Oluwaseun Sharomi, Calistus N. Ngonghala, Abba B. Gumel . Mathematical modeling and analysis of COVID-19 pandemic in Nigeria. Mathematical Biosciences and Engineering, 2020, 17(6): 7192-7220. doi: 10.3934/mbe.2020369
  • The aim of this paper is to introduce a discrete mixture model from the point of view of reliability and ordered statistics theoretically and practically for modeling extreme and outliers' observations. The base distribution can be expressed as a mixture of gamma and Lindley models. A wide range of the reported model structural properties are investigated. This includes the shape of the probability mass function, hazard rate function, reversed hazard rate function, min-max models, mean residual life, mean past life, moments, order statistics and L-moment statistics. These properties can be formulated as closed forms. It is found that the proposed model can be used effectively to evaluate over- and under-dispersed phenomena. Moreover, it can be applied to analyze asymmetric data under extreme and outliers' notes. To get the competent estimators for modeling observations, the maximum likelihood approach is utilized under conditions of the Newton-Raphson numerical technique. A simulation study is carried out to examine the bias and mean squared error of the estimators. Finally, the flexibility of the discrete mixture model is explained by discussing three COVID-19 data sets.



    The data generated by the daily work environment are more complex in nature at present, and therefore many lifetime models have been listed in the literature to analyze and evaluate these data. Determining which probability distribution should be adopted to make inferences from the data under study is a very important problem in statistics. For these reasons, great efforts have been spent over the years in developing large categories of distributions along with relevant statistical methodologies. See, for instance, El-Gohary et al. [1], Saboor et al. [2], Jia et al. [3], Fernandez [4], Alizadeh et al. [5], Kumar et al. [6], and references cited therein. Nedjar and Zeghdoudi [7] proposed a mixture of gamma(2,τ) and Lindley(ε) (MGL) distributions. The probability density function (PDF) of the MGL distribution can be expressed as

    g(x;ε,τ)=τ2ε(1+τ)([τε+ετ]x+1)eτx; x>0,ε>0,τ>0. (1.1)

    Unfortunately, Eq (1.1) is not a proper PDF. Messaadia and Zeghdoudi [8] corrected the parameter space to be ετ1+τ and τ>0, and consequently the modified PDF of the MGL model can be written as

    f(x;ε,τ)=τ2ε(1+τ)([τε+ετ]x+1)eτx; x>0,ετ1+τ,τ>0. (1.2)

    The survival function (SF) corresponding to Eq (1.2) can be formulated as

    S(x;ε,τ)=(τx+1)(τε+ετ)+τε(1+τ)eτx; x>0,ετ1+τ,τ>0. (1.3)

    The quantile function (QF) is

    QX(u)={ε(1+τ)τ[ε(1+τ)τ]1τW1(ε(1+τ)(u1)ε(1+τ)τeε(1+τ)ε(1+τ)τ); ε>τ1+τln(1u)τ                                                 ;   ε=τ1+τ, (1.4)

    where W1 denotes the negative branch of the Lambert W function, and ln(1u)τ is the QF of the exponential model. Sometimes, survival trials produce data that are discrete in nature either because of the limitations of the measuring instruments or their inherent characteristics. The study and analysis of counting data plays an important role in many fields of applied sciences, such as economics, engineering, marketing, medicine, and insurance. Counting datasets are often modeled utilizing the Poisson model. However, the Poisson model cannot handle hyper-scattered datasets. Therefore, it is reasonable to model such cases via an appropriate discrete distribution. Discretization of a continuous distribution can be created by using several methods. The most widely utilized technique is the survival discretization approach. For a given continuous random variable X with SF S(x;ξ)=Pr(X>x), we can obtain the discretized version as

    Pr(X=x)=S(x;ξ)S(x+1;ξ); x=0,1,2,3,.... (1.5)

    For more details, Roy [9]. This technique has received a lot of attention in recent years. See, for instance, Gómez-Déniz and Calderín-Ojeda [10], Bebbington et al. [11], Nekoukhou et al. [12], Alamatsaz et al. [13], El-Morshedy et al. [14], Gillariose et al. [15], Singh et al. [16,17], Eliwa and El-Morshedy [18], Altun et al. [19] and references cited therein. In this paper, a discrete distribution MGL (DsMGL) will be discussed from the point of view of reliability and ordered statistics theoretically and practically to analyze extreme and outliers' notes. This is because in El-Morshedy et al. [20], simple statistical characteristics and a regression model were presented only in a small whole sample (outliers were not included). Further, the previous paper ignored the reliability, order statistics, and L-moments measures which can be applied in the fields of biomedicine and engineering. Given the importance of reliability and structured statistical measures, the authors sought to discuss neglected characteristics as well as model extreme and outliers' observations. Thus, the motives for this study can be summarized as follows: to formulate statistical characteristics as closed forms; to model dispersed-positively-skewed real data under extreme and outliers' observations; to provide a consistently better fit than other discrete models known in the current statistical literature, especially over-dispersed models; and to prove that the proposed model can be applied to discuss zero-inflated observations.

    The article is organized as follows. In Section 2, the DsMGL distribution is proposed. Various properties are derived in Sections 3 and 4. In Section 5, the DsMGL parameters are estimated by utilizing the maximum likelihood approach. Simulation study is discussed in Section 6. In Section 7, three real data sets are analyzed. Finally, some conclusions and future work are listed in Section 8.

    Recall Eq (1.3), and the SF of the DsMGL distribution can be expressed as

    S(x;ε,δ)=(1lnδx+1)(εεlnδ+lnδ)lnδε(1lnδ)δx+1; xN0, (2.1)

    where εlnδ1lnδ, 0<eτ=δ<1, and N0={0,1,2,3,...}. The corresponding PMF to Eq (2.1) can be introduced as

    Px(x;ε,δ)=δx1lnδ{1δlnδ[1+xδ(x+2)]+(11ε)(lnδ)2[xδ(x+1)]}; xN0. (2.2)

    The CDF can be reported as

    F(x;ε,δ)=1(1lnδx+1)(εεlnδ+lnδ)lnδε(1lnδ)δx+1; xN0. (2.3)

    Figure 1 shows the PMF of the DsMGL model based on various values of the parameters ε and δ.

    Figure 1.  The PMF for the DsMGL distribution.

    As can be noted, the PMF can take unimodal or decreasing form. Moreover, it can be used as a statistical approach to model zero-inflated observations under positive skew.

    In reliability theory, the forward iteration "remaining" time and the past time are two very important measurements in the theory of renewal processes. Consider, for example, the lifetime of a wireless link when a new packet arrives in wireless networks. Reliability studies model the remaining life of a component. If something has survived that far, how long can it be expected to survive? This is the question answered by mean residual life (MRL). In the discrete setting, the MRL, say Θi, is defined as

    Θi=E(Ti|Ti)=11F(i1;ε,δ)qj=i+1[1F(j1;ε,δ)];  iN0, (3.1)

    where N0={0,1,2,3,...,q} for 0<q<. Thus, for the DsMGL model, the MRL is given as

    Θi=δ2(ε1)(δii1)lnδε(δ(i+1)i2)lnδ+ε(δ1)(δ1)2{(1ilnδ)(εεlnδ+lnδ)lnδ};  iN0. (3.2)

    Furthermore, in the discrete setting, the mean past life (MPL), say Θi, is defined as

    Θi=E(iT|T<i)=1F(i1;ε,δ)im=1F(m1;ε,δ);  iN0{0}. (3.3)

    So, the MPL for the DsMGL model can be represented as

    Θi=iεδi+2lnδiεδi+1lnδ+iεδ2(lnδ1)2iδi+2lnδ+2(i+1)δi+1lnδεδi+2(lnδ1)(2iεδilnδ2iδilnδεδilnδ+εδi+εlnδε)(δ1)2+εδ2(lnδ1)2δlnδ+(12δ)iεlnδ+2iεδεδi+1+εδiε(2iεδilnδ2iδilnδεδilnδ+εδi+εlnδε)(δ1)2. (3.4)

    For iN0, we get Θii. The CDF of the DsMGL model can be recovered by the MPL as

    F(k;ε,δ)=F(0;ε,δ)ki=1[ΘiΘi+11]; kN0{0}, (3.5)

    where F(0;ε,δ)=(qi=1[ΘiΘi+11])1and 0<q<. Thus, the mean of the DsMGL model can be expressed as

    Mean=iΘiF(i1;ε,δ)+Θi[1F(i1;ε,δ)]; iN0{0}. (3.6)

    The reversed hazard rate function (RHRF) can be expressed as a function in MPL as

    r(i;ε,δ)=1Θi+1+ΘiΘi; iN0{0}. (3.7)

    Further, the RHRF can be proposed as

    r(x;ε,δ)=1δlnδ[1+xδ(x+2)]+(11ε)(lnδ)2[xδ(x+1)]ε(1lnδ)[(1lnδx+1)(εεlnδ+lnδ)lnδ]δx+1εδx;  xN0. (3.8)

    Figure 2 shows the RHRF plots for different values of the DsMGL parameters.

    Figure 2.  The RHRF for the DsMGL model.

    Suppose W and H are two independent DsMGL random variables (RVs) with parameters (ε1,δ1) and (ε2,δ2), respectively. Then, the RHRF of T=min(W,H) and L=max(W,H) can be formulated as

    rT(x;Λ)=11{(1lnδx1)(ε1ε1lnδ1+lnδ1)lnδ1ε1(1lnδ1)δx1}{(1lnδx2)(ε2ε2lnδ2+lnδ2)lnδ2ε2(1lnδ2)δx2}1{(1lnδx+11)(ε1ε1lnδ1+lnδ1)lnδ1ε1(1lnδ1)δx+11}{(1lnδx+12)(ε2ε2lnδ2+lnδ2)lnδ2ε2(1lnδ2)δx+12}, (3.9)

    and

    rL(x;Λ)=1δ1lnδ1[1+xδ1(x+2)]+(11ε1)(lnδ1)2[xδ1(x+1)]ε1(1lnδ1)[(1lnδx+11)(ε1ε1lnδ1+lnδ1)lnδ1]δx+11ε1δx1+1δ2lnδ2[1+xδ2(x+2)]+(11ε2)(lnδ2)2[xδ2(x+1)]ε2(1lnδ2)[(1lnδx+12)(ε2ε2lnδ2+lnδ2)lnδ2]δx+12ε2δx21δ1lnδ1[1+xδ1(x+2)]+(11ε1)(lnδ1)2[xδ1(x+1)]ε1(1lnδ1)[(1lnδx+11)(ε1ε1lnδ1+lnδ1)lnδ1]δx+11ε1δx1×1δ2lnδ2[1+xδ2(x+2)]+(11ε2)(lnδ2)2[xδ2(x+1)]ε2(1lnδ2)[(1lnδx+12)(ε2ε2lnδ2+lnδ2)lnδ2]δx+12ε2δx2. (3.10)

    Since the RHRFs of the two RVs W and H are decreasing, then the RHRFs of T=min(W,H) and L=max(W,H) are also decreasing. Another important measure in survival analysis theory is called the hazard rate function (HRF). If X is a DsMGL random variable, then the HRF can be expressed as

    h(x;ε,δ)=1[(1lnδx+1)(εεlnδ+lnδ)lnδ]δ(1lnδx)(εεlnδ+lnδ)lnδ;  xN0, (3.11)

    where h(x;ε,δ)=Px(x;ε,δ)S(x1;ε,δ). Figure 3 shows the HRF plots for different values of the DsMGL parameters. It should be noted that the new paradigm can be used to discuss any phenomena of an increasing unilateral form.

    Figure 3.  The HRF for the DsMGL distribution.

    Suppose W and H are two independent RVs with parameters DsMGL(ε1,δ2) and DsMGL(ε2,δ2), respectively. Then, the HRF of T=min(W,H) can be formulated as

    hT(x;Λ)=Pr(min(W,H)=x)Pr(min(W,H)x)=Pr(min(W,H)x)Pr(min(W,H)x+1)Pr(min(W,H)x)=Pr(Wx)Pr(Hx)Pr(Wx+1)Pr(Hx+1)Pr(Wx)Pr(Vx)=Pr(Wx)Pr(H=x)+Pr(W=x)Pr(Hx)Pr(W=x)Pr(H=x)Pr(Wx)Pr(Hx),

    where Λ=(ε1,ε2,δ1,δ2). Then,

    hT(x;Λ)={1[(1lnδx+11)(ε1ε1lnδ1+lnδ1)lnδ1]δ1(1lnδx1)(ε1ε1lnδ1+lnδ1)lnδ1}+{1[(1lnδx+12)(ε2ε2lnδ2+lnδ2)lnδ2]δ2(1lnδx2)(ε2ε2lnδ2+lnδ2)lnδ2}{1[(1lnδx+11)(ε1ε1lnδ1+lnδ1)lnδ1]δ1(1lnδx1)(ε1ε1lnδ1+lnδ1)lnδ1}×{1[(1lnδx+12)(ε2ε2lnδ2+lnδ2)lnδ2]δ2(1lnδx2)(ε2ε2lnδ2+lnδ2)lnδ2}. (3.12)

    Since the HRFs of the two RVs W and H are increasing, the HRF of T=min(W,H) is also increasing. Similarly, the HRF of L=max(W,H) can be expressed as

    hL(x;Λ)=11{1(1lnδx+11)(ε1ε1lnδ1+lnδ1)lnδ1ε1(1lnδ1)δx+11}{1(1lnδx+12)(ε2ε2lnδ2+lnδ2)lnδ2ε2(1lnδ2)δx+12}1{1(1lnδx1)(ε1ε1lnδ1+lnδ1)lnδ1ε1(1lnδ1)δx1}{1(1lnδx2)(ε2ε2lnδ2+lnδ2)lnδ2ε2(1lnδ2)δx2}. (3.13)

    Assume X1:l,X2:l,...,Xl:l are the corresponding order statistics (OS) of the random sample (RS) X1,X2,..., Xl from the DsMGL model. Then, the CDF of the ith OS is given as

    Fi:l(x;ε,δ)=lb=i(lb)[Fi(x;ε,δ]b[1Fi(x;ε,δ)]lb=lb=ilbj=0ϑ(l,b)(m)Fi(x;ε,δ,b+j), (4.1)

    where ϑ(l,b)(m)=(1)j(lb)(lbj). The corresponding PMF to Eq (4.1) can be listed as

    fi:l(x;ε,δ)=Fi:l(x;ε,δ)Fi:l(x1;ε,δ)=lb=ilbj=0ϑ(l,b)(m)fi(x;ε,δ,b+j), (4.2)

    where fi(x;ε,δ,b+j) represents the PMF of the exponentiated DsMGL distribution with power parameter b+j. So, the vth moments of Xi:l can be written as

    E(Xvi:l)=x=0lb=ilbj=0ϑ(l,b)(m)xvfi(x;ε,δ,b+j). (4.3)

    Hosking [21] has defined the L-moment (L-MT) to show the descriptive statistics for the probability model. The L-MT of the DsMGL can be formulated as

    Υo=1oo1c=0(1)c(o1c)E(Xoc:o). (4.4)

    Using Eq (4.4), L-MT of mean =Υ1, L-MT coefficient of variation =Υ2Υ1, L-MT coefficient of skewness =Υ3Υ2, and L-MT coefficient of kurtosis =Υ4Υ2.

    The shape of any probability model can be described by its different moments. Based on the first four moments, the mean "E(X)", variance "Var(X)", index of dispersion "D(X)", skewness "Sk(X)", and kurtosis "Ku(X)" can be derived. Let X be a DsMGL random variable. Then, the probability generating function (PGF), say Υ(z), can be formulated as

    Υ(z)=x=0zxPx(x;ε,δ)=11lnδx=0{1δlnδ[1+xδ(x+2)]+(11ε)(lnδ)2[xδ(x+1)]}(zδ)x=2δ(ε1)(z1)lnδ+ε(δ2z2δ+1)lnδε(δ1)(δz1)ε(lnδ1)(δz1)2, (4.5)

    where the power series converges absolutely at least for all complex numbers z with |z|1. Equation (4.5) can be derived utilizing the Maple software program. Thus, the first four moments of the DsMGL model can be listed as

    E(X)=δ(ε1)lnδ2+ε(δ2)lnδε(δ1)ε(lnδ1)(δ1)2, (4.6)
    E(X2)=δ(3εδ+ε3δ1)lnδ2+ε(δ23δ2)lnδεδ2+εε(lnδ1)(δ1)3, (4.7)
    E(X3)=δ14(δ2+107δ+17)(ε1)lnδ+ε(δ+2)(δ26δ1)lnδε(δ1)(δ2+4δ+1)ε(lnδ1)(δ1)4, (4.8)

    and

    E(X4)=δ30(δ3+113δ2+53δ+115)(ε1)lnδ+ε(δ45δ355δ235δ2)lnδε(lnδ1)(δ1)5δ4+10δ310δ1(lnδ1)(δ1)5. (4.9)

    According to the previous moments "E(Xr); r=1,2,3,4", the E(X), Var(X), Sk(X), and Ku(X) can be derived in closed forms. Table 1 reports some numerical results for the DsMGL model under different values of the distribution parameters.

    Table 1.  Some descriptive statistics under δ=0.05 and various values of ε.
    ε
    Measure 1.1 1.4 1.7 2.1 4.2 8.4
    E(X) 0.10548 0.12972 0.14540 0.15934 0.18897 0.20378
    Var(X) 0.11102 0.13466 0.14933 0.16196 0.18750 0.19962
    D(X) 1.05252 1.03809 1.02701 1.01641 0.99223 0.97955
    Sk(X) 3.31652 2.93747 2.73625 2.57885 2.29485 2.17274
    Ku(X) 14.9956 12.2687 10.94861 9.97976 8.37673 7.74605

     | Show Table
    DownLoad: CSV

    The DsMGL is capable of modeling positively skewed and leptokurtic datasets. Further, it is appropriate for modeling under- (over-) dispersed phenomena where Var(X)|E(X)|<(>)1 for some parameter values.

    In this segment, the maximum likelihood estimates (MLEs) of the model parameters are determined. Let X1,X2,...,Xn be a RS of size n from the DsMGL distribution. The log-likelihood function (LL) can be written as

    LL(x;ε,δ)=lnδni=1xinln(1lnδ)+ni=1ln(1δlnδ[1+xiδ(xi+2)]+(11ε)(lnδ)2[xiδ(xi+1)]). (5.1)

    The MLEs of the parameters ε and δ, say ˆε and ˆδ, are derived by (ˆε,ˆδ) = argmax(ε,δ) (L) or, in an equivalent approach in our case, (ˆε,ˆδ) = argmax(ε,δ) (LL). To provide more practicalities, the normal equations can be formulated as

    LL(x;ε,δ)ε=ni=1(lnδε)2[xiδ(xi+1)]1δlnδ[1+xiδ(xi+2)]+(11ε)(lnδ)2[xiδ(xi+1)] (5.2)

    and

    LL(x;ε,δ)δ=1δni=1xi+nδ(1lnδ)+ni=1(xi+2)(lnδ+1)1+xiδ1+lnδ(11ε)(2xi2δ(xi+1)δ(xi+1)lnδ)1δlnδ[1+xiδ(xi+2)]+(11ε)(lnδ)2[xiδ(xi+1)]. (5.3)

    The two previous equations cannot be solved analytically; therefore, a mathematical package such as the R program based on an iterative procedure such as the Newton-Raphson (numerical optimization approach) can be used to obtain the estimators.

    In this segment, Monte Carlo simulation was performed to prove the efficiency of the DsMGL model utilizing the maximum likelihood approach. The performance of the MLE with respect to sample size n is tested. The evaluation is based on a simulation study:

    1) Generate 10000 samples of size n= 10,11,12,...,60 from the DsMGL model based on four cases as follows: case I: (δ=0.1 and ε=0.9), case II: (δ=0.3 and ε=0.9), case III: (δ=0.5 and ε=0.9), and case IV: (δ=0.7 and ε=0.9).

    2) Generate 10000 samples of size n= 30,70,140,200,300,400,600 from the DsMGL model based on four cases as follows: case V: (δ=0.5 and ε=0.4), case VI: (δ=0.5 and ε=0.5), case VII: (δ=0.5 and ε=1.5), and case VIII: (δ=0.5 and ε=2.0).

    3) Compute the MLEs for the 10000 samples, say ˆεj and ˆδj for j=1,2,3,...,1000.

    4) Compute the biases, mean-squared errors (MSE), and mean relative errors (MRE).

    5) The empirical results are shown in Figures 47 and Tables 2 and 3.

    Figure 4.  The empirical results based on case I.
    Figure 5.  The empirical results based on case II.
    Figure 6.  The empirical results based on case III.
    Figure 7.  The empirical results based on case IV.
    Table 2.  The empirical simulation results for schemes V and VI.
    Scheme V Scheme VI
    Parameter n Bias MSE MRE Bias MSE MRE
    δ 30 0.10056591 0.09006876 0.06336994 0.11632194 0.08963175 0.08223158
    70 0.09865782 0.08830367 0.04163261 0.10223666 0.06202190 0.08063179
    140 0.04369310 0.08200239 0.04001453 0.09220169 0.06101429 0.07312016
    200 0.02299658 0.07703927 0.03192237 0.08139327 0.04002236 0.06350036
    300 0.02126971 0.06005479 0.02101206 0.08023636 0.03237014 0.05519738
    400 0.01733369 0.04079031 0.00700269 0.05193079 0.01112539 0.04980373
    600 0.00234104 0.00201034 0.00023694 0.00292733 0.00026456 0.00015654
    ε 30 0.09369110 0.07793171 0.06131304 0.05969637 0.04320158 0.05336947
    70 0.08567341 0.07122367 0.05510375 0.05223691 0.03201392 0.05102125
    140 0.07011036 0.06379035 0.05022693 0.04112566 0.03036549 0.04020024
    200 0.04105604 0.05380086 0.04500102 0.03233697 0.02002137 0.03102236
    300 0.03367101 0.04522564 0.04103979 0.03102973 0.01122024 0.01909439
    400 0.02098200 0.03474369 0.03139064 0.01018674 0.00889001 0.00306970
    600 0.00523689 0.00236844 0.00053671 0.00235659 0.00059458 0.00002518

     | Show Table
    DownLoad: CSV
    Table 3.  The empirical simulation results for schemes VII and VIII.
    Scheme VII Scheme VIII
    Parameter n Bias MSE MRE Bias MSE MRE
    δ 30 0.025687193 0.01466875 0.01399737 0.06696575 0.08763274 0.06353187
    70 0.022658763 0.01136219 0.10145190 0.06132640 0.06923671 0.06123671
    140 0.019365811 0.00855157 0.00500215 0.05426598 0.05110123 0.05922014
    200 0.015330240 0.00431502 0.00362307 0.05120134 0.04320139 0.04623665
    300 0.013695875 0.00400236 0.00306981 0.05026971 0.03915618 0.04010656
    400 0.008532179 0.00100153 0.00168892 0.04902790 0.03101569 0.03410973
    600 0.001036942 0.00010265 0.00023296 0.00265235 0.00021691 0.00139237
    ε 30 0.108067988 0.01123177 0.08307104 0.07437669 0.07782736 0.09069587
    70 0.103365789 0.00998796 0.08223676 0.06722364 0.05220104 0.08133658
    140 0.089445810 0.00912348 0.07911671 0.04749836 0.03002336 0.06310263
    200 0.071002158 0.00722369 0.06600163 0.03229659 0.01024787 0.05154031
    300 0.053210199 0.00426971 0.03697274 0.03193261 0.01010915 0.03974904
    400 0.009571377 0.00379104 0.00409265 0.02410124 0.00580048 0.02122360
    600 0.000531041 0.00022165 0.00023291 0.00124304 0.00083190 0.00212642

     | Show Table
    DownLoad: CSV

    According to Figures 47 and Tables 2 and 3, the magnitude of bias, MSE, and MRE always decrease to zero as n grows (consistency property). Thus, the MLE approach can be utilized effectively for the parameter's estimation.

    In this section, we demonstrate the resilience of the DsMGL distribution in modeling COVID-19 data. Fitted models are compared using some criteria, namely, the LL, Akaike-information-criterion (AIC), modified-AIC (MAIC), Hannan-Quinn-information-criterion (HQIC), Bayesian-information-criterion (BIC), and Kolmogorov-Smirnov (K-S) test with its corresponding P-value. We will compare the flexibility of the DsMGL model with some of the competitive models such as discrete Lindley (DsL), discrete Burr-Hatke (DsBH), new discrete distribution with one parameter (NDsIP) (see, Eliwa and El-Morshedy, 2022), discrete Rayleigh (DsR), discrete Pareto (DsPa), discrete Burr type XII (DsB-XII), and modified negative binomial (NeBi) models.

    The data is listed in (https://www.worldometers.info/coronavirus/country/china-hong-kong-sar/) and represents the daily new cases in Hong Kong for COVID-19 from Feb. 15, 2020, to Oct. 25, 2020. The initial shape/form of this data is reported in Figure 8 using non-parametric (N-P) methods like strip, box, violin and QQ plots. It is noted that there are extreme and outliers' observations.

    Figure 8.  The N-P plots for COVID-19 data in Hong Kong.

    The MLEs with their corresponding standard error (SE), confidence interval (C. I) for the parameter(s) and goodness-of-fit test (G-O-F-T) for COVID-19 data in Hong Kong, are listed in Tables 4 and 5.

    Table 4.  The MLEs with their corresponding SE and C. I for COVID-19 data in Hong Kong.
    ε δ
    Model MLE SE C. I MLE SE C. I
    DsMGL 0.8887 0.0242 [0.8390,0.9361] 0.1062 0.0373 [0.0355,0.1786]
    DsL 0.8052 0.0200 [0.7660,0.8451]
    NDsIP 0.9110 0.0151 [0.8820,0.9390]
    DsR 0.9951 0.1361 [0.7280,1.000]
    DsB-XII 0.8173 0.0740 [0.6722,0.9622] 3.2981 1.4272 [0.5013,6.0962]
    DsPa 0.5592 0.0531 [0.4515,0.6603]
    DsBH 0.9954 0.0132 [0.9695,1.000]
    NeBi 0.9203 0.0221 [0.8772,0.9634] 0.6859 0.1023 [0.4854,0.8864]

     | Show Table
    DownLoad: CSV
    Table 5.  The G-O-F-T for COVID-19 data in Hong Kong.
    Model
    Statistic DsMGL DsL NDsIP DsR DsB-XII DsPa DsBH NeBi
    LL 118.951 127.120 121.353 161.193 118.992 124.333 130.953 119.237
    AIC 241.892 256.234 244.717 324.384 241.980 250.662 263.902 242.473
    MAIC 242.233 256.343 244.824 324.493 242.325 250.777 264.014 242.816
    BIC 245.178 257.877 246.352 326.024 245.263 252.302 265.543 245.748
    HQIC 243.065 256.825 245.295 324.963 243.157 251.254 264.484 243.638
    K-S 0.157 0.246 0.186 0.761 0.277 0.367 0.565 0.167
    P-value 0.305 0.020 0.146 0.001 0.006 0.001 0.001 0.285

     | Show Table
    DownLoad: CSV

    The DsMGL model is the best among all the discussed models, because it has the lowest value among LL, AIC, MAIC, BIC, HQIC and K-S. Moreover, the DsMGL model has the highest P-value among all tested distributions. Figures 9 and 10 show the estimated CDF and P-P plots for all reported models from which the distribution adequacy of the DsMGL model can be noted clearly. Thus, the COVID-19 data in Hong Kong plausibly came from the DsMGL model.

    Figure 9.  The estimated CDFs for COVID-19 data in Hong Kong.
    Figure 10.  The P-P plots for COVID-19 data in Hong Kong.

    Table 6 lists some descriptive statistics (DEST) for COVID-19 data in Hong Kong utilizing the DsMGL model.

    Table 6.  Some DEST for COVID-19 data in Hong Kong.
    E(X) Var(X) D(X) Sk(X) Ku(X)
    0.1843 0.2096 1.1374 2.7779 12.1489

     | Show Table
    DownLoad: CSV

    The data reported here suffer from excessive dispersion "D(X) > 1". Moreover, it is moderately right skewed "Sk(X)>0" and leptokurtic "Ku(X)>3".

    The data is reported in (https://www.worldometers.info/coronavirus/country/iraq/) and represents the daily new cases in Iraq for COVID-19 from Feb. 15, 2020, to 25 Oct. 25, 2020. In Figure 11, the N-P plots are sketched. It is noted that there is an extreme observation.

    Figure 11.  The N-P plots for COVID-19 data in Iraq.

    The MLEs with their corresponding SE, C. I for the parameter(s) and G-O-F-T for COVID-19 data in Iraq, are listed in Tables 7 and 8.

    Table 7.  The MLEs with their corresponding SE and C. I for COVID-19 data in Iraq.
    ε δ
    Model MLE SE C. I MLE SE C. I
    DsMGL 0.863 0.078 [0.709,1.015] 0.129 0.144 [0,0.411]
    DsL 0.768 0.024 [0.721,0.815]
    NDsIP 0.895 0.015 [0.865,0.926]
    DsR 0.989 0.0142 [0.710,1]
    DsB-XII 0.531 0.087 [0.361,0.701] 1.009 0.253 [0.513,1.506]
    DsPa 0.528 0.056 [0.419,0.637]
    DsBH 0.989 0.019 [0.953,1]
    NeBi 0.928 0.053 [0.824,1.032] 0.486 0.141 [0.209,0.762]

     | Show Table
    DownLoad: CSV
    Table 8.  The G-O-F-T for COVID-19 data in Iraq.
    Model
    Statistic DsMGL DsL NDsIP DsR DsB-XII DsPa DsBH NeBi
    LL 107.731 114.313 109.230 138.243 112.394 112.393 116.527 109.426
    AIC 219.464 230.635 220.452 278.495 228.771 226.775 235.054 222.852
    MAIC 219.812 230.747 220.573 278.603 229.123 226.893 235.163 223.205
    BIC 222.683 232.244 222.707 280.092 231.993 228.387 236.664 226.074
    HQIC 220.595 231.195 221.025 279.051 229.914 227.343 235.623 223.988
    K-S 0.214 0.279 0.2423 0.736 0.355 0.357 0.505 0.251
    P-value 0.068 0.006 0.026 0.0001 0.0002 0.0002 0.0001 0.029

     | Show Table
    DownLoad: CSV

    The DsMGL model is the best among all the studied models. Figures 12 and 13 show the estimated CDFs and P-P plots for COVID-19 data in Iraq.

    Figure 12.  The estimated CDFs for COVID-19 data in Iraq.
    Figure 13.  The P-P plots for COVID-19 data in Iraq.

    From Figures 13 and 14, the data set plausibly came from the DsMGL model. Table 9 lists some information for COVID-19 data in Iraq based on the DsMGL model.

    Table 9.  Some DEST for COVID-19 data in Iraq.
    E(X) Var(X) D(X) Sk(X) Ku(X)
    0.2252 0.2640 1.1724 2.6089 11.2663

     | Show Table
    DownLoad: CSV
    Figure 14.  The N-P plots for COVID-19 data in Saudi Arabia.

    The data listed here suffer from excessive dispersion. Furthermore, it is moderately right skewed and leptokurtic.

    The data is reported in (https://en.wikipedia.org/wiki/2020_coronavirus_pandemic_in_Saudi_Arabia) and represents the daily new cases in Saudi Arabia for COVID-19 from March 1, 2020, to Sep. 13, 2021. In Figure 14, the N-P plots are reported. It is found that there are extreme observations.

    The MLEs with their corresponding SE, C. I for the parameter(s) and G-O-F-T for COVID-19 data in Saudi Arabia are reported in Tables 10 and 11.

    Table 10.  The MLEs with their corresponding SE and C. I for COVID-19 data in Saudi Arabia.
    ε δ
    Model MLE SE C. I MLE SE C. I
    DsMGL 0.891 0.0003 0.884,0.897] 0.302 0.023 [0.300,0.305]
    DsL 0.896 0.003 [0.889,0.902]
    NDsIP 0.956 0.002 [0.953,0.959]
    DsR 0.998 0.001 [0.995,1]
    DsB-XII 0.909 0.028 [0.854,0.964] 4.067 1.300 [1.519,6.616]
    DsPa 0.688 0.012 [0.665,0.711]
    DsBH 0.998 0.002 [0.994,1.002]
    NeBi 0.521 0.125 [0.276,0.766] 0.745 0.045 [0.657,0.833]

     | Show Table
    DownLoad: CSV
    Table 11.  The G-O-F-T for COVID-19 data in Saudi Arabia.
    Model
    Statistic DsMGL DsL NDsIP DsR DsB-XII DsPa DsBH NeBi
    LL 1844.941 1849.332 1879.599 1894.439 2244.195 2301.366 2641.779 1847.236
    AIC 3693.882 3693.881 3761.199 3792.878 4492.393 4604.732 5285.559 3698.472
    MAIC 3693.904 3693.904 3761.207 3792.902 4492.415 4604.741 5285.567 3698.496
    BIC 3702.285 3702.285 3765.401 3801.283 4500.796 4608.934 5289.761 3706.877
    HQIC 3697.183 3697.180 3762.849 3796.178 4495.691 4606.382 5287.209 3701.772
    K-S 0.071 0.079 0.169 0.241 0.415 0.439 0.787 0.075
    P-value 0.139 0.129 0.023 0.004 0 0 0 0.134

     | Show Table
    DownLoad: CSV

    According to Table 10, the DsMGL model is the best among all the tested models. Figures 15 and 16 show the estimated CDFs and P-P plots for COVID-19 data in Saudi Arabia. It is found that the data set plausibly came from DsMGL model.

    Figure 15.  The estimated CDFs for COVID-19 data in Saudi Arabia.
    Figure 16.  The P-P plots for COVID-19 data in Saudi Arabia.

    Table 12 lists some DEST around COVID-19 data in Saudi Arabia by utilizing the DsMGL model.

    Table 12.  Some DEST for COVID-19 data in Saudi Arabia.
    E(X) Var(X) D(X) Sk(X) Ku(X)
    16.777 150.421 8.966 1.006 3.212

     | Show Table
    DownLoad: CSV

    The data presented here suffer from excessive dispersion. Moreover, it is moderately right skewed and leptokurtic.

    In this article, we have developed a new two parameter discrete model, named as discrete mixture gamma-Lindley (DsMGL) distribution. Various important distributional characteristics of the DsMGL distribution have been discussed. One of the important virtues of this newly developed model is that it can not only discuss over-dispersed, under-dispersed, positively skewed, and leptokurtic data sets, but it can also be applied for modeling increasing failure time data (due to its increasing HRF). The unknown parameters of the DsMGL model have been discussed under a method of maximum likelihood (ML) estimation. A detailed MCMC evaluation has been conducted to measure the behavior of the estimators. This numerical study shows that the ML estimation measures work satisfactorily. Finally, the modeling flexibility of the DsMGL distribution has been explored via three real data sets on COVID-19. For future work, the authors will utilize the DsMGL model to generate a bivariate extension distribution based on a shock models approach for modeling bivariate data. Moreover, the first-order integer-valued regression model and autoregressive process will be studied in detail.

    The authors gratefully acknowledge Qassim University, represented by the Deanship of Scientific Research, on the financial support for this research under the number (COS-2022-1-1-J-25173) during the academic year 1444 AH / 2022 AD.



    [1] A. El-Gohary, A. Alshamrani, A. N. Al-Otaibi, The generalized Gompertz distribution, Appl. Math. Model., 37 (2013), 13–24. https://doi.org/10.1016/j.apm.2011.05.017 doi: 10.1016/j.apm.2011.05.017
    [2] A. Saboor, H. S. Bakouch, M. N. Khan, Beta sarhan–zaindin modified Weibull distribution, Appl. Math. Model., 40 (2016), 6604–6621. https://doi.org/10.1016/j.apm.2016.01.033 doi: 10.1016/j.apm.2016.01.033
    [3] X. Jia, S. Nadarajah, B. Guo. Bayes estimation of P (Y < X) for the Weibull distribution with arbitrary parameters, Appl. Math. Model., 47 (2017), 249–259. https://doi.org/10.1016/j.apm.2017.03.020 doi: 10.1016/j.apm.2017.03.020
    [4] A. J. Fernández, Optimal lot disposition from Poisson–Lindley count data, Appl. Math. Model., 70 (2019), 595–604. https://doi.org/10.1016/j.apm.2019.01.045 doi: 10.1016/j.apm.2019.01.045
    [5] M. Alizadeh, A. Z. Afify, M. S. Eliwa, S. Ali, The odd log-logistic Lindley-G family of distributions: Properties, Bayesian and non-Bayesian estimation with applications, Comput. Stat., 35 (2020), 281–308. https://doi.org/10.1007/s00180-019-00932-9 doi: 10.1007/s00180-019-00932-9
    [6] S. Kumar, A. S. Yadav, S. Dey, M. Saha, Parametric inference of generalized process capability index Cpyk for the power Lindley distribution, Qual. Technol. Quant. Manage., 19 (2022), 153–186. https://doi.org/10.1080/16843703.2021.1944966 doi: 10.1080/16843703.2021.1944966
    [7] S. Nedjar, H. Zeghdoudi, On gamma Lindley distribution: Properties and simulations, J. Comput. Appl. Math., 15 (2016), 167–174. https://doi.org/10.1016/j.cam.2015.11.047 doi: 10.1016/j.cam.2015.11.047
    [8] H. Messaadia, H. Zeghdoudi, Around gamma Lindley distribution, J. Mod. Appl. Stat. Methods., 16 (2017), 23.
    [9] D. Roy, The discrete normal distribution, Commun. Stat. Theory Methods, 32 (2003), 1871–1883. https://doi.org/10.1081/STA-120023256 doi: 10.1081/STA-120023256
    [10] E. Gómez-Déniz, E. Calderín-Ojeda, The discrete Lindley distribution: Properties and applications, J. Stat. Comput. Simul., 81 (2011), 1405–1416. https://doi.org/10.1080/00949655.2010.487825 doi: 10.1080/00949655.2010.487825
    [11] M. Bebbington, C. D. Lai, M. Wellington, R. Zitikis, The discrete additive Weibull distribution: A bathtub-shaped hazard for discontinuous failure data, Reliab. Eng. Syst. Saf., 106 (2012), 37–44. https://doi.org/10.1016/j.ress.2012.06.009 doi: 10.1016/j.ress.2012.06.009
    [12] V. Nekoukhou, M. H. Alamatsaz, H. Bidram, Discrete generalized exponential distribution of a second type, Statistics, 47 (2013), 876–887. https://doi.org/10.1080/02331888.2011.633707 doi: 10.1080/02331888.2011.633707
    [13] M. H. Alamatsaz, S. Dey, T. Dey, S. S. Harandi, Discrete generalized Rayleigh distribution, Pak. J. Stat., 32 (2016).
    [14] M. El-Morshedy, M. S. Eliwa, H. Nagy, A new two-parameter exponentiated discrete Lindley distribution: Properties, estimation and applications, J. Appl. Stat., 47 (2020), 354–375. https://doi.org/10.1080/02664763.2019.1638893 doi: 10.1080/02664763.2019.1638893
    [15] J. Gillariose, O. S. Balogun, E. M. Almetwally, R. A. Sherwani, F. Jamal, J. Joseph, On the discrete Weibull Marshall–Olkin family of distributions: Properties, characterizations, and applications, Axioms, 10 (2021), 287. https://doi.org/10.3390/axioms10040287 doi: 10.3390/axioms10040287
    [16] B. Singh, R. P. Singh, A. S. Nayal, A. Tyagi, Discrete inverted Nadarajah-Haghighi distribution: Properties and classical estimation with application to complete and censored data, Stat. Optim. Inform. Comput., 10 (2022), 1293–1313. https://doi.org/10.19139/soic-2310-5070-1365 doi: 10.19139/soic-2310-5070-1365
    [17] B. Singh, V. Agiwal, A. S. Nayal, A. Tyagi, A discrete analogue of Teissier distribution: Properties and classical estimation with application to count data, Relia. Theory Appl., 17 (2022), 340–355. https://doi.org/10.24412/1932-2321-2022-167-340-355 doi: 10.24412/1932-2321-2022-167-340-355
    [18] M. S. Eliwa, M. El-Morshedy, A one-parameter discrete distribution for over-dispersed data: Statistical and reliability properties with applications, J. Appl. Stat., 49 (2022), 2467–2487. https://doi.org/10.1080/02664763.2021.1905787 doi: 10.1080/02664763.2021.1905787
    [19] E. Altun, M. El-Morshedy, M. S. Eliwa, A study on discrete Bilal distribution with properties and applications on integervalued autoregressive process, Stat. J., 20 (2022), 501–528. https://doi.org/10.57805/revstat.v20i4.384 doi: 10.57805/revstat.v20i4.384
    [20] M. El-Morshedy, E. Altun, M. S. Eliwa, A new statistical approach to model the counts of novel coronavirus cases, Math. Sci., (2021), 1–4. https://doi.org/10.1007/s40096-021-00390-9
    [21] J. R. Hosking, L-moments: Analysis and estimation of distributions using linear combinations of order statistics, J. Royal Stat. Soc. Series B, 52 (1990), 105–124. https://doi.org/10.1111/j.2517-6161.1990.tb01775.x doi: 10.1111/j.2517-6161.1990.tb01775.x
  • Reader Comments
  • © 2023 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Metrics

Article views(1631) PDF downloads(98) Cited by(0)

Figures and Tables

Figures(16)  /  Tables(12)

/

DownLoad:  Full-Size Img  PowerPoint
Return
Return

Catalog