
This paper proposes an inference approach based on a pivotal quantity under the adaptive progressive Type-II censoring scheme. To exemplify the proposed methodology, an extensively employed distribution, a Pareto distribution, is utilized. This distribution has limitations in estimating confidence intervals for unknown parameters from classical methods such as the maximum likelihood and bootstrap methods. For example, in the maximum likelihood method, the asymptotic variance-covariance matrix does not always exist. In addition, both classical methods can yield confidence intervals that do not satisfy nominal levels when a sample size is not large enough. Our approach resolves these limitations by allowing us to construct exact intervals for unknown parameters with computational simplicity. Aside from this, the proposed approach leads to closed-form estimators with properties such as unbiasedness and consistency. To verify the validity of the proposed methodology, two approaches, a Monte Carlo simulation and a real-world data analysis, are conducted. The simulation testifies to the superior performance of the proposed methodology as compared to the maximum likelihood method, and the real-world data analysis examines the applicability and scalability of the proposed methodology.
Citation: Young Eun Jeon, Suk-Bok Kang, Jung-In Seo. Pivotal-based inference for a Pareto distribution under the adaptive progressive Type-II censoring scheme[J]. AIMS Mathematics, 2024, 9(3): 6041-6059. doi: 10.3934/math.2024295
[1] | Bing Long, Zaifu Jiang . Estimation and prediction for two-parameter Pareto distribution based on progressively double Type-II hybrid censored data. AIMS Mathematics, 2023, 8(7): 15332-15351. doi: 10.3934/math.2023784 |
[2] | Naif Alotaibi, A. S. Al-Moisheer, Ibrahim Elbatal, Salem A. Alyami, Ahmed M. Gemeay, Ehab M. Almetwally . Bivariate step-stress accelerated life test for a new three-parameter model under progressive censored schemes with application in medical. AIMS Mathematics, 2024, 9(2): 3521-3558. doi: 10.3934/math.2024173 |
[3] | Mustafa M. Hasaballah, Oluwafemi Samson Balogun, M. E. Bakr . Frequentist and Bayesian approach for the generalized logistic lifetime model with applications to air-conditioning system failure times under joint progressive censoring data. AIMS Mathematics, 2024, 9(10): 29346-29369. doi: 10.3934/math.20241422 |
[4] | Hatim Solayman Migdadi, Nesreen M. Al-Olaimat, Maryam Mohiuddin, Omar Meqdadi . Statistical inference for the Power Rayleigh distribution based on adaptive progressive Type-II censored data. AIMS Mathematics, 2023, 8(10): 22553-22576. doi: 10.3934/math.20231149 |
[5] | Haiping Ren, Xue Hu . Estimation for inverse Weibull distribution under progressive type-Ⅱ censoring scheme. AIMS Mathematics, 2023, 8(10): 22808-22829. doi: 10.3934/math.20231162 |
[6] | Hassan Okasha, Mazen Nassar, Saeed A. Dobbah . E-Bayesian estimation of Burr Type XII model based on adaptive Type-Ⅱ progressive hybrid censored data. AIMS Mathematics, 2021, 6(4): 4173-4196. doi: 10.3934/math.2021247 |
[7] | Najwan Alsadat, Mahmoud Abu-Moussa, Ali Sharawy . On the study of the recurrence relations and characterizations based on progressive first-failure censoring. AIMS Mathematics, 2024, 9(1): 481-494. doi: 10.3934/math.2024026 |
[8] | Hanan Haj Ahmad, Ehab M. Almetwally, Dina A. Ramadan . A comparative inference on reliability estimation for a multi-component stress-strength model under power Lomax distribution with applications. AIMS Mathematics, 2022, 7(10): 18050-18079. doi: 10.3934/math.2022994 |
[9] | Mohamed S. Eliwa, Essam A. Ahmed . Reliability analysis of constant partially accelerated life tests under progressive first failure type-II censored data from Lomax model: EM and MCMC algorithms. AIMS Mathematics, 2023, 8(1): 29-60. doi: 10.3934/math.2023002 |
[10] | Tahani A. Abushal, Alaa H. Abdel-Hamid . Inference on a new distribution under progressive-stress accelerated life tests and progressive type-II censoring based on a series-parallel system. AIMS Mathematics, 2022, 7(1): 425-454. doi: 10.3934/math.2022028 |
This paper proposes an inference approach based on a pivotal quantity under the adaptive progressive Type-II censoring scheme. To exemplify the proposed methodology, an extensively employed distribution, a Pareto distribution, is utilized. This distribution has limitations in estimating confidence intervals for unknown parameters from classical methods such as the maximum likelihood and bootstrap methods. For example, in the maximum likelihood method, the asymptotic variance-covariance matrix does not always exist. In addition, both classical methods can yield confidence intervals that do not satisfy nominal levels when a sample size is not large enough. Our approach resolves these limitations by allowing us to construct exact intervals for unknown parameters with computational simplicity. Aside from this, the proposed approach leads to closed-form estimators with properties such as unbiasedness and consistency. To verify the validity of the proposed methodology, two approaches, a Monte Carlo simulation and a real-world data analysis, are conducted. The simulation testifies to the superior performance of the proposed methodology as compared to the maximum likelihood method, and the real-world data analysis examines the applicability and scalability of the proposed methodology.
In the experiments of various fields such as sciences, public health, and medicine, a proper censoring scheme is often considered because censored data arise commonly. The most famous censoring scheme is the progressive Type-II (PT-II) censoring scheme, where the observation number m(≤n) and the associated censoring scheme R=(R1,…,Rm) are pre-fixed, and the Ri surviving units are randomly withdrawn when the ith failure arises from the experiment. The PT-II censoring scheme is attractive in that cost and time are saved by withdrawing surviving units during the experiment. This advantage has given significant attention to this censoring scheme in the literature [1,2,3]. In particular, some authors developed inference methods based on a pivotal quantity as an alternative to the maximum likelihood method under the PT-II censoring scheme. Wang et al. [4] proved that a pivotal approach is superior to the maximum likelihood approach in parameter estimation of the two-parameter Weibull distribution for a small sample size. Seo and Kang [5] discussed a pivotal-based inference for estimating a scale parameter of the scaled half logistic distribution. They demonstrated the superiority of estimators based on a pivotal quantity for a small sample size, compared to the maximum likelihood estimator (MLE) and approximate MLEs. Seo et al. [6] studied an estimation method for unknown parameters of a Pareto distribution based on the regression-type framework using a pivotal quantity by extending the idea of Lu and Tao [7].
However, the PT-II censoring scheme has a drawback in that the total experiment time can still be long. To overcome this issue, Ng et al. [8] proposed the use of the ideal experiment time T as an adapted version, and adapted the PT-II censoring scheme as follows: If the mth failure arises before time T, then the experiment is terminated at the time Xm:m:n, which is the same as the PT-II censoring scheme. If the mth failure arises after time T, then surviving units are not withdrawn from the experiment by setting RL+1=⋯=Rm−1=0, where L(<m−1) is the number of observed failures until time T. At the time of the mth failure, all remaining Rm=n−m−L∑i=1Ri surviving units are withdrawn. This censoring scheme is called the adaptive PT-II (APT-II) censoring scheme. The basic idea of the APT-II censoring scheme is to finish the experiment as fast as possible when the experiment duration exceeds a predetermined time T. So, this censoring scheme can save both the total experiment time and cost, and increase the efficiency of statistical analysis. Due to this efficiency, the APT-II censoring scheme has been discussed by some authors such as Sobhi and Soliman [9], Ye et al. [10], and Mohan and Chacko [11].
This paper proposes an inference method based on a pivotal quantity under the APT-II censoring scheme. For an illustration of the proposed methodology, a Pareto distribution with the following cumulative distribution function (CDF) and probability density function is employed:
F(x;λ,θ)=1−(θx)λ | (1.1) |
and
f(x;λ,θ)=λθλx−(λ+1),x>θ,λ>0,θ>0, |
respectively, where λ is the shape parameter of interest and θ is the scale parameter. The Pareto distribution is one of the most widely used distributions to model a wide range of real-world cases in various fields such as economics, sociology, and engineering. For example, the distribution of income, internet traffic, and urban population is known to follow the Pareto distribution. To improve the estimation performance of the Pareto distribution, some authors have studied the inference for unknown parameters of this distribution by applying a pivotal quantity. Chen [12] used a pivotal quantity to obtain a joint confidence region for unknown parameters. Wu [13] introduced a pivotal-based method for obtaining a joint confidence region for unknown parameters under the doubly Type-II censoring scheme. Zhang [14] provided simplified versions of Chen [12] and Wu [13] to avoid computational difficulties. Kim et al. [15] proposed an estimation method based on the regression-type framework using a pivotal quantity, which provides a consistent estimator for the shape parameter. However, despite the superiority of the pivotal-based method, Mohie El-Din et al. [16] discussed a Bayesian approach for the Pareto distribution under the APT-II censoring scheme, and the approach has a substantial computational burden. Alternatively, by extending a pivotal approach to the APT-II censoring scheme, this paper proposes an inference method based on its excellent scalability, and the superiority and applicability of the proposed method are substantiated. Significantly, this study marks a novel approach by utilizing a pivotal-based method for the first time, aimed at estimating the unknown parameters of the Pareto distribution within the context of the APT-II censoring scheme.
For the Pareto distribution with CDF (1.1), classical methods such as the maximum likelihood and bootstrap methods have limitations in estimating confidence intervals (CIs) for λ and θ. The maximum likelihood method yields not exact but approximate CIs with the burden of computing the Fisher information matrix (FIM), and the approximate CIs do not ensure the satisfaction of the nominal levels when a sample size is not large enough. In addition, the asymptotic variance-covariance matrix (AVCM) of the MLEs suffers from a constraint. In the case of the bootstrap method, the CIs fail to satisfy nominal levels. These issues are expounded in Section 2. On the other hand, the proposed pivotal-based method not only easily leads to exact CIs (ECIs) for λ and θ without any conditions even in a situation where a sample size is not large enough, but also provides closed-form inference results.
Furthermore, this paper introduces the generalized pivotal quantity (GPQ) under the APT-II censoring scheme, which is applicable to the inference for functions with unknown parameters. As a specific example, we develop a method of generating the replicated data from the marginal distribution for the observed APT-II censored sample, inferring the distribution.
The rest of this paper is organized as follows: Section 2 proposes a pivotal-based estimation method for the unknown parameters of the Pareto distribution under the APT-II censoring scheme. Section 3 provides an algorithm, that generates the replicated data of the observed APT-II censored sample, based on GPQs. Section 4 conducts the Monte Carlo simulation and the real data analysis to assess the proposed methodology. Section 5 concludes this paper.
Let X1:m:n≤⋯≤Xm:m:n be an APT-II censored sample with the censoring scheme
R∗={R,if Xm:m:n<T,(R1,…,RL,0∗m−L−1,n−m−L∑i=1Ri),if Xm:m:n>T, | (2.1) |
where 0∗m−L−1 denotes a vector of zeros of the size m−L−1. The censoring scheme (2.1) means that the APT-II censoring scheme is the same as the PT-II censoring scheme for Xm:m:n<T, and does not allow the removal of experimental units by setting RL+1=⋯=Rm−1=0 for Xm:m:n>T.
Suppose that the APT-II censored sample has a Pareto distribution with CDF (1.1). The corresponding likelihood and its logarithm functions are given by
L(λ,θ)∝λmθλnm∏i=1x−λ(1+Ri)−1i:m:n |
and
logL(λ,θ)∝mlogλ+λnlogθ−λm∑i=1(1+Ri)logxi:m:n, | (2.2) |
respectively. Since the log-likelihood function (2.2) is a monotonically increasing function of θ, the MLE of θ is obtained as ˆθ=X1:m:n. Then, the MLE of λ is obtained as ˆλ=m/(m∑i=1(1+Ri)logXi:m:n−nlogX1:m:n) by maximizing the log-likelihood function with θ=X1:m:n. Note that the MLEs ˆλ and ˆθ are biased estimators. This is proved after stating Lemma 2.1. Moreover, constructing approximate CIs based on the MLEs is straightforward due to the asymptotic normality of the MLEs. However, this process necessitates the variance of the MLEs, which can be derived from the following FIM, calculated using the second partial derivatives of the negative log-likelihood function:
I(λ,θ)=[E(−∂2logL(λ,θ)∂λ2)E(−∂2logL(λ,θ)∂λ∂θ)E(−∂2logL(λ,θ)∂θ∂λ)E(−∂2logL(λ,θ)∂θ2)]=(m/λ2−n/θ−n/θnλ/θ2). | (2.3) |
By inverting the FIM (2.3), the AVCM of the MLEs ˆλ and ˆθ is obtained as
ˆΣ=(m/λ2−n/θ−n/θnλ/θ2)−1|(λ=ˆλ,θ=ˆθ)=ˆθ2ˆλn(m−nˆλ)(nˆλ/ˆθ2n/ˆθn/ˆθm/ˆλ2),ˆλ<mn. | (2.4) |
The diagonal elements of the AVCM (2.4) are the variances of the MLEs ˆλ and ˆθ, respectively. Note that the AVCM (2.4) does not always exist because of the constraint ˆλ<m/n. In other words, the construction of approximate CIs is not always possible. Alternatively, we propose inference based on the pivotal quantity under the APT-II censoring scheme.
Let
Yi:m:n=−log(1−F(xi:m:n;λ,θ))=λlog(Xi:m:nθ),i=1,…,m. | (2.5) |
Then, Y1:m:n≤⋯≤Ym:m:n is an APT-II censored sample that has a standard exponential distribution with mean E(Yi:m:n)=i∑j=11/Γj, where Γ1=n and Γj=n−j−1∑k=1(1+Rk) for j=2,…,m. It is readily apparent that, as the quantity (2.5) is expressed as Yi:m:nd=−log(1−Ui:m:n), using the relationship F(Xi:m:n;λ,θ)d=Ui:m:n, where Ad=B denotes that A and B have the same distribution, and Ui:m:n is an ith APT-II censored order statistic from a standard uniform distribution. In addition, the quantity (2.5) induces the normalized spacings
Zi=Γi(Yi:m:n−Yi−1:m:n)=λΓilog(Xi:m:nXi−1:m:n),i=1,…,m, |
where Y0:m:n=0. Note that Zi (i=1,…,m) are independent standard exponential random variables [17], which leads to some pivotal quantities that play an important role in deriving estimation equations. These pivotal quantities are provided in Lemma 2.1. Before introducing Lemma 2.1, to simply express the frequently used distributions, we declare two notations as follows: χ2v and F(d1,d2) denote a χ2 distribution with v degrees of freedom and an F distribution with (d1,d2) degrees of freedom, respectively.
Lemma 2.1. Let X1:m:n≤⋯≤Xm:m:n be an APT-II censored sample with the censoring scheme R∗ from the Pareto distribution with CDF (1.1). Then,
(a)X1(λ)=2λ(m∑i=1(1+Ri)logXi:m:n−nlogX1:m:n),(b)F(θ)=m∑i=1(1+Ri)logXi:m:n−nlogX1:m:nn(m−1)log(X1:m:n/θ), |
and
(c)X2(θ)=2(m−1)log(m∑j=1(1+Rj)logXj:m:n−nlogθ)−2m−1∑i=1log[i∑j=1(1+Rj)logXj:m:n+(n−i∑j=1(1+Rj))logXi:m:n−nlogθ], |
which have χ22(m−1), F(2(m−1),2), and χ22(m−1) distributions, respectively.
Proof. (a) and (b) are clear from X1(λ)=2m∑i=2Zi and F(θ)=(2m∑i=2Zi/[2(m−1)])/(2Z1/2) since Zi(i=1,…,m) are independent standard exponential random variables as mentioned earlier. In addition, the quantity i∑j=1Zj/m∑j=1Zj(i=1,…,m−1) are the order statistics from a standard uniform distribution with the sample size m−1. Then, (c) is proved from
X2(θ)=−2m−1∑i=1log(i∑j=1Zjm∑j=1Zj). |
This completes the proof.
The following subsections provide inference results based on the pivotal quantities in Lemma 2.1.
From the pivotal quantity X1(λ) in Lemma 2.1, it is easily proved that the MLE ˆλ is a biased estimator because it has an inverse gamma distribution with parameters (m−1,λm). Alternatively, an estimator of λ is obtained as
ˆλu=m−2m∑i=1(1+Ri)logXi:m:n−nlogX1:m:n. |
Theorem 2.2. The estimator ˆλu has unbiasedness and consistency.
Proof. The estimator ˆλu has an inverse gamma distribution with parameters (m−1,λ(m−2)) from the pivotal quantity X1(λ) in Lemma 2.1. This indicates E(ˆλu )=λ and Var(ˆλu)=λ2/(m−3). In addition, the variance Var(ˆλu) converges to zero in probability as m→∞. This completes the proof.
We introduce another consistent estimator for λ under the APT-II censoring scheme, which is derived from a weighted least squares approach based on a regression-type framework using the quantity (2.5).
Let Di:m:n=Yi:m:n−Y1:m:n (i=2,…,m). Then, its expectation is given by
E(Di:m:n)=E(Yi:m:n)−E(Y1:m:n)=i∑j=11Γj−1Γ1=i∑j=21Γj,i=2,…,m. |
Using it, we can consider the following linear regression model:
E(Di:m:n)=λlog(Xi:m:nX1:m:n)+εi,i=2,…,m, | (2.6) |
where εi is the error term with E(εi)=0. Note that the regression model (2.6) is a simple regression model with no intercept and does not depend on θ. Then, the weighted least squares estimator for λ is obtained from Eq (2.6) by minimizing the quantity m∑i=2wi:m:n[E(Di:m:n)−λlog(Xi:m:n/X1:m:n)]2 with respect to λ as
ˆλw=m∑i=2wi:m:nE(Di:m:n)log(Xi:m:n/X1:m:n)m∑i=2wi:m:n[log(Xi:m:n/X1:m:n)]2, | (2.7) |
where wi:m:n is the weight of each data point and assumes that it is not an identical constant.
Theorem 2.3. Let wi:m:n=1/Var(Di:m:n) in the estimator (2.7), where Var(Di:m:n)=i∑j=2Γ−2j,i=2,…,m. Then, the estimator ˆλw is a consistent estimator.
Proof. The estimator ˆλw with wi:m:n=1/Var(Di:m:n) can be written as
ˆλw=λm∑i=2Di:m:nE(Di:m:n)/Var(Di:m:n)m∑i=2D2i:m:n/Var(Di:m:n)=λm∑i=2Q1,i:m:n+m∑i=2Q3,i:m:nm∑i=2Q2,i:m:n+m∑i=2Q3,i:m:n=λm∑i=2Q1,i:m:n/m2+m∑i=2Q3,i:m:n/m2m∑i=2Q2,i:m:n/m2+m∑i=2Q3,i:m:n/m2, | (2.8) |
where
Q1,i:m:n=Di:m:nE(Di:m:n)−E2(Di:m:n)Var(Di:m:n),Q2,i:m:n=D2i:m:n−E2(Di:m:n)Var(Di:m:n),Q3,i:m:m=E2(Di:m:n)Var(Di:m:n). |
In Eq (2.8), the quantities m∑i=2Q1,i:m:n/m2 and m∑i=2Q2,i:m:n/m2 converge to zero in probability as m→∞. In addition, since the quantity m∑i=2Q3,i:m:n/m2 converges to a constant in probability as m→∞, the fraction term in Eq (2.8) converges to 1 in probability as m→∞. The results for these probability convergences can be easily shown according to Seo et al. [6]. The proof in Seo et al. [6] is restated to match our notation.
The quantities m∑i=2Q1,i:m:n/m2 and m∑i=2Q2,i:m:n/m2 converge in the mean to 0 since
E(|1m2m∑i=2Q1,i:m:n|)=0 |
and
E(|1m2m∑i=2Q2,i:m:n|)=m−1m2, |
which implies convergence in probability [18]. In addition,
1m2m∑i=2Q3,i:m:n=1m2m∑i=2{i∑j=2[m∑k=j(1+Rk)]−1}2i∑j=2[m∑k=j(1+Rk)]−2≥1m2m∑i=2[i∑j=2(m−1+m∑k=jRk)−1]2i∑j=2(m−i+1)−2=1m2m∑i=2(i−1)(m−i+1m−1+m∑k=jRk)2=1m2(m−1+m∑k=jRk)2(m2m∑j=1j+m∑j=1j3−2mm∑j=1j2), |
which converges to a constant as m→∞. This completes the proof.
The estimation for θ can be accomplished by a similar argument to that used for obtaining the estimator ˆλu. To do this, we employ the pivotal quantity F(θ) in Lemma 2.1. The inverse of the pivotal quantity F(θ) is written as
1F(θ)=2Z1/22m∑i=2Zi/[2(m−1)], | (2.9) |
which has an F(2,2(m−1)) distribution. By Slutsky's theorem [19], the distribution of 2/F(θ) converges to a χ22 distribution as m→∞ since the denominator term in (2.9) converges to 1 in probability as m→∞. Then, using the mean of a χ22 distribution, an equation for θ is obtained as 2/F(θ)=2. From the equation, an estimator of θ is derived as
ˆθp=X1:m:nexp(−m∑i=1(1+Ri)logXi:m:n−nlogX1:m:nn(m−1)). |
Theorem 2.4. The estimator ˆθp is an asymptotic unbiased and consistent estimator.
Proof. Since the quantity (m∑i=1(1+Ri)logXi:m:n−nlogX1:m:n)/[n(m−1)] has a gamma distribution with parameters (m−1,nλ(m−1)), by its moment generating function, we have
E[exp(−m∑i=1(1+Ri)logXi:m:n−nlogX1:m:nn(m−1))]=[1+1nλ(m−1)]−(m−1) | (2.10) |
and
E[exp(−2(m∑i=1(1+Ri)logXi:m:n−nlogX1:m:n)n(m−1))]=[1+2nλ(m−1)]−(m−1). | (2.11) |
In addition, the first and second moments of X1:m:n are
E(X1:m:n)=∫∞θnλ(θx)λndx=nλθnλ−1,λ>1n | (2.12) |
and
E(X21:m:n)=∫∞θnλx(θx)λndx=nλθ2nλ−2,λ>2n, | (2.13) |
respectively. Then, using (2.10) and (2.12), the mean of the estimator ˆθp is obtained as
E(ˆθp)=E[X1:m:nexp(−m∑i=1(1+Ri)logXi:m:n−nlogX1:m:nn(m−1))]=E(X1:m:n)E[exp(−m∑i=1(1+Ri)logXi:m:n−nlogX1:m:nn(m−1))]=nλθ(nλ−1)[1+1nλ(m−1)]−(m−1) |
and using (2.10)–(2.13), the variance of the estimator ˆθp is obtained as
Var(ˆθp)=Var[X1:m:nexp(−m∑i=1(1+Ri)logXi:m:n−nlogX1:m:nn(m−1))]={E[exp(−m∑i=1(1+Ri)logXi:m:n−nlogX1:m:nn(m−1))]}2Var(X1:m:n)+Var[exp(−m∑i=1(1+Ri)logXi:m:n−nlogX1:m:nn(m−1))]E(X21:m:n)=−[1+1nλ(m−1)]−2(m−1)n2λ2θ2(nλ−1)2+[1+2nλ(m−1)]−(m−1)nλθ2(nλ−2)=−[1+1(m+m∑i=1Ri)λ(m−1)]−2(m−1)(m+m∑i=1Ri)2λ2θ2[(m+m∑i=1Ri)λ−1]2+[1+2(m+m∑i=1Ri)λ(m−1)]−(m−1)(m+m∑i=1Ri)λθ2[(m+m∑i=1Ri)λ−2]. |
The mean E(ˆθp) converges to θ in probability as m→∞, and the variance Var(ˆθp) converges to zero in probability as m→∞. This completes the proof.
Note that the estimator ˆθp is not unbiased because of E(ˆθp)≠θ. So, we provide another estimator for θ that is improved in terms of the bias. From (2.12), an unbiased estimator of θ is obtained as X1:m:n[1−1/(nλ)] for known λ. Then, by substituting λ with the MLE ˆλ, an unbiased estimator of θ is derived as
ˆθu=X1:m:n[1−mn(m−1)ˆλ]=X1:m:n[1−m∑i=1(1+Ri)logXi:m:n−nlogX1:m:nn(m−1)]. |
The relationship ˆθp≥ˆθu is satisfied because of exp(−a)≥1−a for all real numbers a which can be easily shown from the Maclaurin series exp(a)=1+a+∞∑i=2ai/i! for all real numbers a.
Theorem 2.5. The estimator ˆθu has unbiasedness and consistency.
Proof. The inverse of MLE ˆλ has a gamma distribution with parameters (m−1,λm), and its first and second moments are
E(1ˆλ)=m−1λm | (2.14) |
and
E(1ˆλ2)=m−1λ2m, | (2.15) |
respectively. Then, using (2.12) and (2.14), the mean of the estimator ˆθu is obtained as
E(ˆθu)=E[X1:m:n(1−mn(m−1)ˆλ)]=E(X1:m:n)−E(X1:m:n)E[mn(m−1)ˆλ]=nλθnλ−1−θnλ−1=θ. |
Using (2.12)–(2.15), the variance of the estimator ˆθu is obtained as
Var(ˆθu)=Var[X1:m:n(1−mn(m−1)ˆλ)]={E[1−mn(m−1)ˆλ]}2Var(X1:m:n)+Var[1−mn(m−1)ˆλ]E(X2i:m:n)=mθ2nλ(nλ−2)(m−1)=mθ2(m+m∑i=1Ri)λ[(m+m∑i=1Ri)λ−2](m−1), |
which converges to zero as m→∞. This completes the proof.
On the other hand, the pivotal quantity X2(θ) yields an estimation equation X2(θ)=2(m−2) from the fact that X2(θ)/2(m−2) converges to 1 in probability as m→∞. However, the equation does not provide a closed form of solution, so it is not considered here.
As mentioned earlier, the approximate CIs based on the MLEs are constructed when the condition ˆλ<m/n is satisfied. Another classical method of constructing CIs for λ and θ is based on the bootstrap method. It is conducted through the following steps: First, the MLEs ˆλ and ˆθ are calculated based on the original APT-II censored sample. Second, B bootstrap APT-II censored samples are generated from the marginal distribution with the MLEs ˆλ and ˆθ, which is denoted as X(b)1:m:n≤⋯≤X(b)m:m:n,b=1,…,B. Then, the MLEs for λ and θ are calculated based on the bootstrap APT-II censored sample {X(b)1:m:n,…,X(b)m:m:n}, and it is denoted as ˆλ(b) and ˆθ(b),b=1,…,B, respectively. After obtaining {ˆλ(1),…,ˆλ(B)} and {ˆθ(1),…,ˆθ(B)}, we can construct the CIs for λ and θ using their percentiles. In other words, 100(1−α)% CIs for λ and θ are constructed as (ˆλ([(α/2)B]),ˆλ([(1−α/2)B])) and (ˆθ([(α/2)B]),ˆθ([(1−α/2)B])) for 0<α<1, where ˆλ([αB]) and ˆθ([αB]) denote the [αB]th smallest values of {ˆλ(1),…,ˆλ(B)} and {ˆθ(1),…,ˆθ(B)}, respectively. However, the CI for θ does not satisfy nominal levels because ˆθ(b)=X(b)1:m:n,b=1,…,B is always greater than the true value of θ.
The pivotal-based interval inference we now provide can address these limitations, and it does not require complex mathematical calculations, unlike the FIM. Even for a small sample size, the estimated interval from the pivotal quantity satisfies nominal levels well. Here, ECIs for λ and θ are provided using the pivotal quantities in Lemma 2.1.
For λ, since the pivotal quantity X1(λ) in Lemma 2.1 has a χ22(m−1) distribution, it follows that
1−α=P[χ21−α/2,2(m−1)<2λ(m∑i=1(1+Ri)logXi:m:n−nlogX1:m:n)<χ2α/2,2(m−1)] |
for 0<α<1, where χ2α,2(m−1) denotes the upper α percentile of a χ22(m−1) distribution. Then, an exact 100(1−α)% CI for λ is given by
[χ21−α/2,2(m−1)2(m∑i=1(1+Ri)logXi:m:n−nlogX1:m:n),χ2α/2,2(m−1)2(m∑i=1(1+Ri)logXi:m:n−nlogX1:m:n)]. |
For θ, since the pivotal quantity F(θ) in Lemma 2.1 has an F(2(m−1),2) distribution, it follows that
1−α=P[F1−α/2(2(m−1),2)<m∑i=1(1+Ri)logXi:m:n−nlogX1:m:nn(m−1)log(X1:m:n/θ)<Fα/2(2(m−1),2)] |
for 0<α<1, where Fα(2(m−1),2) denotes the upper α percentile of an F(2(m−1),2) distribution. Then, an exact 100(1−α)% CI for θ is given by
[X1:m:nexp(−Fα/2(2,2(m−1))g(X)),X1:m:nexp(−F1−α/2(2,2(m−1))g(X))], |
where g(X)=(m∑i=1(1+Ri)logXi:m:n−nlogX1:m:n)/[n(m−1)].
Let Xrepi:m:n(i=1,…,m) be the replicated data of the observed APT-II censored sample x={x1:m:n,…,xm:m:n} with the censoring scheme R∗. The replicated data Xrepi:m:n is generated by inferring the marginal distribution F(Xrepi:m:n|x;λ,θ) based on the observed APT-II censored sample. To achieve it, GPQs for λ and θ are first defined as
G1(λ)=X1(λ)2(m∑i=1(1+Ri)logxi:m:n−nlogx1:m:n) |
and
G2(θ)=exp(logx1:m:n−m∑i=1(1+Ri)logxi:m:n−nlogx1:m:nn(m−1)F(θ)), |
respectively, according to the argument of Weerahandi [20]. The GPQs G1(λ) and G2(θ) obviously have two properties: The distributions of the GPQs G1(λ) and G2(θ) are free of unknown parameters, and the realization of the GPQs G1(λ) and G2(θ) does not depend on the nuisance parameter.
The realization of the GPQs G1(λ) and G2(θ) is obtained by using a pseudorandom sequence from a χ22 distribution based on an idea on which the pivotal quantities X1(λ) and F(θ) in Lemma 2.1 are derived. Then, by substituting λ and θ in the marginal distribution F(Xrepi:m:n|x;λ,θ) with the realization of the GPQs G1(λ) and G2(θ), respectively, the replicated data Xrepi:m:n is generated. The detailed steps are given in the following algorithm:
Algorithm 1
(a) Generate ζ1,…,ζm from a χ22 distribution.
(b) Compute X∗=m∑i=2ζi, followed by G∗1(λ)=X∗2(m∑i=1(1+Ri)logxi:m:n−nlogx1:m:n).
(c) Compute F∗=m∑i=2ζi/[2(m−1)]/(ζ1/2), followed by G∗2(θ)=exp(logx1:m:n−m∑i=1(1+Ri)logxi:m:n−nlogx1:m:nn(m−1)F∗).
(d) Generate Xrepi:m:n from the sampling distribution F(Xrepi:m:n|x;G∗1(λ),G∗2(θ)).
(e) Repeat N(≥10,000) times (a)–(d).
From Algorithm 1, a 100(1−α)% interval for the replicated data Xrepi:m:n is constructed as
(Xrep,([(α/2)N])i:m:n,Xrep,([(1−α/2)N])i:m:n), |
where Xrep,([αN])i:m:n denotes the [αN]th smallest value of {Xrep,1i:m:n,…,Xrep,Ni:m:n}. This interval is employed to evaluate the uncertainty of the replicated data Xrepi:m:n in Section 4.2.
The proposed methodology is evaluated using the Monte Carlo simulation technique. Moreover, its applicability and scalability are examined by performing real-world data analysis.
To verify the estimation performance of the proposed estimators in comparison with the MLEs, Monte Carlo simulations with 10,000 replications are conducted through the R software. For the Pareto distribution, the shape parameter of interest, λ, is assigned 0.5(0.5)1.5 to showcase the variations in results according to the values of λ, and the scale parameter θ is assigned 1 without loss of generality. In addition, the following censoring scheme is employed to emphasize the strength of the proposed methodology for a sample size that is not large enough in various situations:
Scheme I : n=20,m=8,R=(1∗3,0∗4,9),II : n=20,m=8,R=(2∗2,0∗5,8),III : n=20,m=6,R=(1∗2,0∗3,12),IV : n=20,m=6,R=(2,0∗4,12),V : n=40,m=16,R=(1∗7,0∗8,17),VI : n=40,m=16,R=(2∗6,0∗9,12),VII : n=40,m=12,R=(1∗5,0∗6,23),VIII : n=40,m=12,R=(2∗4,0∗7,20). |
For each censoring scheme, an APT-II censored sample is generated using the following steps:
(a) Generate an ordinary PT-II censored sample X∗1:m:n,…,X∗m:m:n with the censoring scheme R from the algorithms of Balakrishnan and Sandhu [21] as follows:
(1) Generate m random numbers W1,…,Wm from a standard uniform distribution.
(2) Compute Vi=W1/(i+Rm+⋯+Rm−i+1)i, i=1,…,m.
(3) Compute Ui=1−Vm⋯Vm−i+1, i=1,…,m, where U1,…,Um is a PT-II censored sample of size m from a standard uniform distribution.
(4) Compute X∗i:m:n=θ/(1−Ui)1/λ, i=1,…,m, to obtain a PT-II censored sample from the Pareto distribution with CDF (1.1).
(b) Determine the value of L, where X∗L:m:n<T<X∗L+1:m:n.
(c) Generate the first m−l−1 order statistics from a truncated distribution f(x)/[1−F(xl+1:m:n)] with sample size n−l∑i=1(1+Ri)−1.
(d) Substitute X∗l+2:m:n,…,X∗m:m:n with the first m−l−1 order statistics obtained in (c).
To ensure that the number of simulated APT-II censored samples observed until T is greater than 1, we assign T=2.5. Based on the generated APT-II censored samples, the mean squared errors (MSEs) and biases of the provided estimators are computed for each censoring scheme. The results are reported in Figure 1. In addition, the coverage probabilities (CPs) for the exact 95% CIs are provided in Figure 2. As mentioned earlier, the approximate and bootstrap CIs based on the MLEs suffer from constraints. Accordingly, only the results of the proposed ECIs for λ and θ are reported.
In Figure 1, the length of the line indicates the difference from the true value, so the shorter this is, the better the performance of the corresponding estimator. Based on this argument, the results are summarized as follows: For λ, ˆλu has the best performance in terms of the MSE and bias, followed by ˆλw. For θ, ˆθp and ˆθu are more efficient than ˆθ in terms of the MSE and bias. To be specific, ˆθu has the smallest bias, as expected, while ˆθp is slightly better than ˆθu in terms of the MSE. Taken together, the proposed estimators have generally better performance than the corresponding MLEs in terms of MSE and bias. Furthermore, the use of the proposed estimators, especially ˆλu and ˆθu, is strongly recommended to infer the unknown parameters of the Pareto distribution with CDF (1.1) when the sample size is not large enough. In Figure 2, the length of the line represents the difference from the considered nominal level 0.95. According to this argument, the proposed intervals have a highly closer CP to the considered nominal level.
In conclusion, the results in Figures 1 and 2 reveal that the proposed estimation method yields more satisfactory results than the maximum likelihood estimation method in a situation where the sample size is not large enough.
The Pareto distribution is often employed for modeling a wide range of real-world datasets, such as insurance, reliability, engineering, and economics. In this subsection, the mortality rate of COVID-19 studied by Almetwally et al. [22] and Nik et al. [23] is analyzed to validate the practical application of the proposed approaches for a real-world dataset, where the mortality rate refers to the proportion of individuals who have died in a specific population or group affected by a particular disease. Nik et al. [23] applied a new Pareto-type distribution to the mortality rate of COVID-19 for 25 days, from 10 April to 4 May 2020, in Canada, and the data is as follows:
3.1091 | 3.3825 | 3.1444 | 3.2135 | 2.4946 | 3.5146 | 4.9274 | 3.3769 | 6.8686 | 3.0914 | 4.9378 | 3.1091 | 3.2823 |
3.8594 | 4.0480 | 4.1685 | 3.6426 | 3.2110 | 2.8636 | 3.2218 | 2.9078 | 3.6346 | 2.7957 | 4.2781 | 4.2202 |
For analysis, an APT-II censored sample is generated from the above dataset by setting R=(2,1∗11) with m=12 and T=3.15. The generated APT-II censored sample and the analysis results are reported in Tables 1 and 2, respectively.
i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
xi:12:25 | 2.4946 | 2.7957 | 2.8636 | 2.9078 | 3.1091 | 3.1091 | 3.2218 | 3.2823 | 3.3769 | 3.3825 | 3.6346 | 3.6426 |
Estimates | 95% CIs | |||||||
ˆλ | ˆλu | ˆλw | ˆθ | ˆθp | ˆθu | ECI for λ | ECI for θ | |
2.050 | 1.708 | 1.209 | 2.495 | 2.441 | 2.442 | (0.938,3.141) | (2.272,2.493) | |
Additionally, to examine the applicability of Algorithm 1 in Section 3, the 95% intervals for the replicated data Xrepi:12:25 are obtained based on N=20000. The resulting plot is presented in Figure 3, which shows that the observed APT-II censored sample lies well within the 95% intervals. This result reveals the applicability of the provided Algorithm 1 for real-world data analysis. In addition, Figure 3 shows the goodness-of-fit test results obtained through the plug-in method. The plug-in method generates the replicated data Xrepi:12:25 by substituting the unknown parameters λ and θ in the marginal distribution F(Xrepi:12:25|x;λ,θ) with their estimators. Here, 20,000 replicated data Xrepi:12:25 are from F(Xrepi:12:25|x;ˆλ,ˆθ) and F(Xrepi:12:25|x;ˆλu,ˆθp), and the empirical mean is computed. From Figure 3, it can be seen that the APT-II censored sample reported in Table 1 has the Pareto distribution because the points are close to a straight line.
The main goal of this paper is the proposal of an efficient inference method compared to classical methods under the APT-II censoring scheme. To achieve this goal, we employed the pivotal quantity. For illustration, the proposed method was applied to the Pareto distribution and provided closed-form estimators with excellent properties for the unknown parameters. In addition, the pivotal approach led to the exact inference results for the unknown parameters even when a sample size is not large enough, unlike the maximum likelihood and bootstrap methods which suffer from constraints. Additionally, an algorithm was proposed that generates the replicated data based on the GPQs.
The proposed methodology was evaluated through Monte Carlo simulations for small and middle sample sizes. Our results showed that the provided estimators are superior to the MLEs in terms of MSE and bias, and, especially, the unbiased and consistent estimators ˆλu and ˆθu have the best performance. In addition, the proposed intervals are clearly exact for unknown parameters, and it is demonstrated that their CPs are highly close to the considered nominal level in simulation results. These results reveal the usefulness of the proposed method in a situation where the sample size is not large enough. Additionally, the mortality rate of COVID-19 in Canada was analyzed, through which the applicability of the proposed methodology for a real-world dataset is demonstrated.
In conclusion, for the Pareto distribution, the proposed methodology has superior performance when the sample size is not large enough, in comparison to the classical method. While this paper primarily provided insights into the Pareto distribution, future studies will delve into several critical aspects. First, the efficiency and accuracy of the proposed methodology will be assessed in various realistic scenarios, to extend its practical utility. Second, the applicability of our approach to various other probability distributions will be explored, broadening the scope of the study. In addition, a sensitivity analysis will also be conducted to ascertain the robustness of our methodology, especially in situations where the distribution estimation may be incorrect. Finally, subsequent studies will focus on validating the performance of our approach in real-world applications, and a deeper exploration of applicability in industrial settings will be undertaken.
The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this article.
The authors declare there is no conflict of interest.
[1] |
M. Basirat, S. Baratpour, J. Ahmadi, Statistical inferences for stress-strength in the proportional hazard models based on progressive Type-II censored samples, J. Stat. Comput. Sim., 85 (2015), 431–449. https://doi.org/10.1080/00949655.2013.824449 doi: 10.1080/00949655.2013.824449
![]() |
[2] |
A. S. Nik, A. Asgharzadeh, M. Z. Raqab, Estimation and prediction for a new Pareto-type distribution under progressive Type-II censoring, Math. Comput. Simulat., 190 (2021), 508–530. https://doi.org/10.1016/j.matcom.2021.06.005 doi: 10.1016/j.matcom.2021.06.005
![]() |
[3] |
S. Dey, L. Wang, M. Nassar, Inference on Nadarajah-Haghighi distribution with constant stress partially accelerated life tests under progressive Type-II censoring, J. Appl. Stat., 49 (2022), 2891–2912. https://doi.org/10.1080/02664763.2021.1928014 doi: 10.1080/02664763.2021.1928014
![]() |
[4] |
B. X. Wang, K. Yu, M. C. Jones, Inference under progressively Type-II right-censored sampling for certain lifetime distributions, Technometrics, 52 (2010), 453–460. https://doi.org/10.1198/TECH.2010.08210 doi: 10.1198/TECH.2010.08210
![]() |
[5] |
J. I. Seo, S. B. Kang, Pivotal inference for the scaled half logistic distribution based on progressively Type-II censored samples, Stat. Probabil. Lett., 104 (2015), 109–116. https://doi.org/10.1016/j.spl.2015.05.011 doi: 10.1016/j.spl.2015.05.011
![]() |
[6] |
J. I. Seo, S. B. Kang, H. Y. Kim, New approach for analysis of progressive Type-II censored data from the Pareto distribution, Commun. Stat. Appl. Met., 25 (2018), 569–575. https://doi.org/10.29220/CSAM.2018.25.5.569 doi: 10.29220/CSAM.2018.25.5.569
![]() |
[7] |
H. L. Lu, S. H. Tao, The estimation of Pareto distribution by a weighted least square method, Qual. Quant., 41 (2007), 913–926. https://doi.org/10.1007/s11135-007-9100-8 doi: 10.1007/s11135-007-9100-8
![]() |
[8] |
H. K. T. Ng, D. Kundu, P. S. Chan, Statistical analysis of exponential lifetimes under an adaptive Type-II progressive censoring scheme, Nav. Res. Log., 56 (2009), 687–698. https://doi.org/10.1002/nav.20371 doi: 10.1002/nav.20371
![]() |
[9] |
M. M. A. Sobhi, A. A. Soliman, Estimation for the exponentiated Weibull model with adaptive Type-II progressive censored schemes, Appl. Math. Model., 40 (2016), 1180–1192. https://doi.org/10.1016/j.apm.2015.06.022 doi: 10.1016/j.apm.2015.06.022
![]() |
[10] |
Z. S. Ye, P. S. Chan, M. Xie, H. K. T. Ng, Statistical inference for the extreme value distribution under adaptive Type-II progressive censoring schemes, J. Stat. Comput. Sim., 84 (2014), 1099–1114. https://doi.org/10.1080/00949655.2012.740481 doi: 10.1080/00949655.2012.740481
![]() |
[11] |
R. Mohan, M. Chacko, Estimation of parameters of Kumaraswamy-exponential distribution based on adaptive Type-II progressive censored schemes, J. Stat. Comput. Sim., 91 (2021), 81–107. https://doi.org/10.1080/00949655.2020.1807547 doi: 10.1080/00949655.2020.1807547
![]() |
[12] |
Z. Chen, Joint confidence region for the parameters of Pareto distribution, Metrika, 44 (1996), 191–197. https://doi.org/10.1007/BF02614065 doi: 10.1007/BF02614065
![]() |
[13] |
S. F. Wu, Interval estimation for a Pareto distribution based on a doubly Type-II censored sample, Comput. Stat. Data An., 52 (2008), 3779–3788. https://doi.org/10.1016/j.csda.2007.12.015 doi: 10.1016/j.csda.2007.12.015
![]() |
[14] |
J. Zhang, Simplification of joint confidence regions for the parameters of the Pareto distribution, Comput. Stat., 28 (2013), 1453–1462. https://doi.org/10.1007/s00180-012-0354-9 doi: 10.1007/s00180-012-0354-9
![]() |
[15] |
J. H. T. Kim, S. Ahn, S. Ahn, Parameter estimation of the Pareto distribution using a pivotal quantity, J. Korean Stat. Soc., 46 (2017), 438–450. https://doi.org/10.1016/j.jkss.2017.01.004 doi: 10.1016/j.jkss.2017.01.004
![]() |
[16] |
M. M. Mohie El-Din, A. R. Shafay, M. Nagy, Statistical inference under adaptive progressive censoring scheme, Comput. Stat., 33 (2018), 31–74. https://doi.org/10.1007/s00180-017-0745-z doi: 10.1007/s00180-017-0745-z
![]() |
[17] |
E. Cramer, G. Iliopoulos, Adaptive progressive Type-II censoring, Test, 19 (2010), 342–358. https://doi.org/10.1007/s11749-009-0167-5 doi: 10.1007/s11749-009-0167-5
![]() |
[18] | A. F. Karr, Probability, New York: Springer-Verlag, 1993. |
[19] | E. Slutsky, Über stochastische asymptoten und grenzwerte, Metron, 5 (1925), 3–89. |
[20] | S. Weerahandi, Generalized confidence intervals, In: Exact statistical methods for data analysis, New York: Springer, 1995,143–168. https://doi.org/10.1007/978-1-4612-0825-9_6 |
[21] |
N. Balakrishnan, R. A. Sandhu, A simple simulational algorithm for generating progressive Type-II censored samples, Am. Stat., 49 (1995), 229–230. https://doi.org/10.1080/00031305.1995.10476150 doi: 10.1080/00031305.1995.10476150
![]() |
[22] |
E. M. Almetwally, R. Alharbi, D. Alnagar, E. H. Hafez, A new inverted Topp-Leone distribution: Applications to the COVID-19 mortality rate in two different countries, Axioms, 10 (2021), 25. https://doi.org/10.3390/axioms10010025 doi: 10.3390/axioms10010025
![]() |
[23] |
A. S. Nik, A. Asgharzadeh, A. Baklizi, Inference based on new Pareto-type records with applications to precipitation and COVID-19 data, Stat. Optim. Inf. Comput., 11 (2023), 243–257. http://dx.doi.org/10.19139/soic-2310-5070-1591 doi: 10.19139/soic-2310-5070-1591
![]() |
i | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
xi:12:25 | 2.4946 | 2.7957 | 2.8636 | 2.9078 | 3.1091 | 3.1091 | 3.2218 | 3.2823 | 3.3769 | 3.3825 | 3.6346 | 3.6426 |
Estimates | 95% CIs | |||||||
ˆλ | ˆλu | ˆλw | ˆθ | ˆθp | ˆθu | ECI for λ | ECI for θ | |
2.050 | 1.708 | 1.209 | 2.495 | 2.441 | 2.442 | (0.938,3.141) | (2.272,2.493) | |