1.
Introduction
Quantile-based reliability analysis is a novel approach in reliability theory that uses quantile functions as opposed to the traditional approach based on distribution functions. Quantile and distribution functions are mathematically equivalent methods for defining probability distributions. For a positive random variable T, the conditional past lifetime (also called inactivity time) at time t, Tt=t−T|T<t, is a dual for the residual lifetime and is the cornerstone of many studies in survival analysis and reliability theory. The α-quantile past lifetime (α-QPL) function is defined to be the α quantile of the conditional past lifetime Tt and can be simplified to
where −α=1−α and Q(y)=inf{x:F(x)≥y} is called the quantile function of F. Unnikrishnan and Vineshkumar [1] examined some properties of this measure and its relationship to the reversed hazard rate function. Shafaei and Izadkhah [2] defined the α-QPL concept for parallel systems and investigated its properties. Shafaei [3] showed that the distribution can be characterized by two appropriate quantiles past life functions. Mahdy [4] investigated some ordering characteristics of the α-QPL function and discussed the problem of estimating this measure for uncensored data using an empirical distribution function. Balmert and Jeong [5] focused on the nonparametric estimation of the special case of 0.5-QPL, referred to as the median past life function. They also compared two or more groups of data based on the ratio of their median-quantile past life functions. In addition, Balmert et al. [6] investigated a log-linear quantile regression model for inactivity time when the data are right-censored.
The applicability and usefulness of α-QPL, discussed in two later references, motivated us to discuss the problem of its estimation for right-censored data and to investigate the statistical properties of the estimator. Another point that motivates me is that the median of the past lifetime may be preferred to the mean past lifetime, especially when the data are heavily censored or skewed or the moments do not exist. However, there is a very little literature on the topic of estimating the α-QPL function. In my approach, the Kaplan-Meier (KM) survival estimator is used to define the estimator as a continuous process in time. The rest of this paper is organized as follows. Section 2 reviews some preliminary remarks, including notations, the KM survival estimator and related processes. Section 3 defines the estimator of the α-QPL function. Its weak convergence to a Gaussian process, confidence intervals for the α-QPL function, and the strong convergence of the estimator are also topics that are investigated. Section 4 investigates the behavior of the estimator and the proposed confidence interval (CI) by simulating Gamma and Weibull models (see for example Teamah Abd-Elmonem et al. [7] and Elbatal et al. [8]). Section 5 presents applications to two datasets from the Mayo Clinic Primary Biliary Cirrhosis Study and North Central Cancer. Finally, in Section 6, I conclude the paper and suggest directions for future work.
2.
The KM survival estimator and related processes
Let Ti, i=1,2,...,n be n iid instances following the distribution function F which are censored by random censorship variable C with distribution function H. The random instance Ti is uncensored when Ti≤Ci and otherwise it is censored. Then the observations are n pairs (Xi,δi) where Xi=Ti∧Ci and δi=I(Ti≤Ci) is the censoring indicator. Let G show the distribution of Xi. Through the paper, let bF=inf{x:F(x)=1} shows the upper bound of the support of the distribution F. Similarly denote bH and bG for distributions H and G, respectively. Clearly, bG=bF∧bH where ∧ stands for the minimum operator. The reliability functions corresponding to F, H, and G, which are represented by −F, −H, and −G, respectively, are related by
A concept that proves quite useful in our investigation is the probability that an event occurs before or at t and is uncensored, i.e.,
Let Ni(t)=I(Xi≤t,δi=1) and Yi(t)=I(Xi≥t), then −N(t)=∑ni=1Ni(t) shows the number of items failed up to or at time t and −Y(t)=∑ni=1Yi(t) represents the number of items at risk at time t. Also, let Δ−N(t)=−N(t)−−N(t−). The KM estimator, also known as the product-limit (PL) estimator, of the survival function is given by
where X(n)=max{X1,X2,...,Xn}. It is widely used for estimating the survival function of right-censored data. The PL estimator of the distribution function is simply given by Fn(t)=1−−Fn(t). A satisfactory property of this estimator is that it is asymptotically unbiased, more precisely,
This shows that the bias can be considerable for large values of t and/or heavily censored data. In the following, I discuss some necessary processes. More details can be found in Chung [9]. The PL process is defined by
Burke et al. [10] showed that there is a sequence of Wiener processes {Wn(t),t≥0}, such that for any t∗<bG
where
and ˜F is defined in (2.1). Moreover, (2.5) implies the following law of iterated logarithm for the PL-process, see Burke et al. [10] and Csorgo and Horvath [11].
Also, Burke et al. [10] showed that
in which
and −Gn and ˜Fn are their impirical functions, i.e.,
and
Applying the PL estimator of the distribution function, the PL quantile process is defined by
The PL-normed quantile process is defined by
where f is the density function of F. Aly et al. [12] showed that for p∗∈(0,1) with Q(p∗)<bG, there is a sequence of Wiener processes {Wn(t),t≥0} such that
The uniform PL-process is defined by
where F∗n(y)=Fn(Q(y)) could be computed by PL estimator of the (Wi,δi) and Wi=F(Xi). Then,
and the uniform PL-quantile process is defined to be
Let b∗G be the upper bound of the distribution of Wi. Aly et al. [12] proved that for y0<b∗G such that Q(y0)<bG, there are sequences of Wiener processes {Wn(t),t≥0} such that
and
These equations imply the following law of iterated logarithm relations respectively.
and
For more details about these processes, refer to Chung [9].
3.
The α-QPL estimator
The estimator of the α-QPL function is defined as in the following.
Similar to the process adopted for the α-QRL function by Csorgo and Csorgo [13] and Chung [9], the scaled α-QPL process rαn(t) can be defined by
The following result shows that this process converges weakly to a Gaussian process.
Theorem 1. Assume that 0<α<1, 0≤t<bG, Q(−αF(t))<bG and the density-quantile function f(Q()) is continuous at point −αF(t). Then, I have
where σ2α,t=(1−−αF(t))2d(Q(−αF(t)))+−α2−F2(t)d(t)−2−α(1−−αF(t))−F(t)d(Q(−αF(t))).
Proof. It could be checked easily that Q∗n(p)=F(Qn(p)) for all 0<p<1. Then, by substituting F(Qn(−αFn(t))) by Q∗n(−αF(t)), and applying Taylor expansion on the Q function, for 0<t<X(n), I have
where δn,t,α lies between −αF(t) and Q∗n(−αFn(t)) and sαn(t)=√n(−αF(t)−Q∗n(−αFn(t))). Clearly δn,t,α→−αF(t) as n→∞ and by continuity of the density quantile function
Then the well-known Slutsky's theorem implies that the asymptotic distribution of the rαn(t) and sαn(t) is the same. Now, by adding and subtracting the expression −αFn(t), I have
Then, applying (3.5), (2.12) and (2.13), I have
and in turn
Note that Q(−αF(t))≤t, so d(Q(−αF(t)))≤d(t), and by properties of the Wiener processes, it results that
where σ2α,t is defined previously.
Let zp/2 be the upper p2-quantile of the standard normal distribution, i.e., for the standard normal random variable Z, P(Z>zp2)=p/2. Using Theorem 1, I can construct a CI for the α-QPL function, which is described in the following theorem.
Theorem 2. Let 0<α<1, 0≤t<bG, Q(−αF(t))<bG and the density-quantile function f(Q()) is continuous at point −αF(t). In addition, I assume that fn is a consistent estimator of the density function f in the neighborhood of Q(−αF(t)). Then, an asymptotic (1−p) percent CI for qα(t) is
where
and
A consistent estimator of the density function f should be used to calculate the CI. A fundamental problem in statistics is the estimation of the density function, which many have already addressed. Specifically, for censored data, I can refer to Blum and Susarla [14], Burke and Horvath [15], Mielniczuk [16], Marron and Padgett [17], Lo et al. [18], and many other studies. The following theorem provides a different approach to constructing a CI for the α-QPL function that does not depend on the density function. This idea is similar to that of Csorgo and Csorgo [13] and Chung [9] for the α-quantile residual life function.
Theorem 3. Let 0<α<1, t satisfy the conditions of Theorem 2 and the density-quantile function f(Q()) is continuous at point −αF(t). Then, an asymptotic (1−p) percent CI for qα(t) is
where σ2α,t,n is defined in Theorem 2.
Proof. For simplicity, take un=−αFn(t)−n−12zp/2√σ2α,t,n and vn=−αFn(t)+n−12zp/2√σ2α,t,n. I should have
This probability can be written as in the following.
where
In the last equality, I used Taylor expansion for Q and δn lies between un and −αF(t). Similarly
where δ'n lies between vn and −αF(t). Note that un, vn, δn and δ'n all converge to −αF(t) as n→∞. Now, combining (3.13) and (3.14), I have
The second equality is due to the fact that f(Q(vn))f(Q(δ'n))→1 and f(Q(un))f(Q(δn))→1 as n→∞. The third equality follows by the fact that un and vn converge to −αF(t) and σ2α,t,n converges to σ2α,t as n→∞. On the other hand, the asymptotic distribution of −ρn(−αF(t))+−αen(F(t)) is same as the
which by (3.6), is same with distribution of sαn(t) as n→∞. Thus, by Theorem 1, I have
and this completes the proof.
3.2. Strong convergence
Here, I show that the defined process in (3.1) converges almost surely to qα(t) under some mild conditions. The following result proves that the scaled process rαn(t) can be approximated by a Gaussian process with zero mean.
Let Gαn(t) be defined by
where {Wn(t),t≥0} is a sequence of Wiener processes converging to Wiener process W(t). Also, let Gα(t) be defined by
Then, for each n=1,2,..., {Gαn(t),t≥0} has the same distribution as {Gα(t),t≥0}.
Theorem 4. Let 0<α<1 be fixed, b∈(0,bG) such that Q(−αF(b))<bG, F be twice differentiable on (0,bF) and f be positive on the interval (0,Q(−α)). Then, as n→∞, I have
Proof. By (3.4), I have
where δn,t,α lies between −αF(t) and Q∗n(−αFn(t)) and sαn(t)=−αF(t)−Q∗n(−αFn(t)). As demonstrated in proof of Theorem 1,
Now, applying (2.12) and (2.13), I find that
Recall the relations (2.6) and (2.15) that state, respectively,
and since Q(−αF(b))<bG,
I have
The last equality follows from (3.23) and (3.24).
Since the density function f is differentiable and positive on (0,Q(−α)), applying (3.25) I have
and
Subsequently, the proof follows from (3.20), (3.22), (3.26), and (3.27).
The following corollary follows from (2.14), (2.15), (3.2), (3.4) and (3.21).
Corollary 1. Under the conditions of Theorem 4, I have
It implies that qα,n(t) converges almost surely to qα(t).
4.
Simulation
The properties of the α-QPL function estimator and CI, defined by Theorem 3, were investigated through simulation studies. For this purpose, two important distributions, Weibull and Gamma, are considered, with the following distribution functions:
and
where γ(α,x)=∫x0yα−1e−ydy is the lower incomplete gamma function.
It is assumed that the censoring random variable C follows a uniform model in interval (0,M). Based on the censorship percentage, p, I compute M by M=E(T)/p where T is a determined Weibull or gamma random variable. Some parameter values are selected and r=5000 replicates of samples with sizes n=25 or 50 are simulated. For each sample, the median past lifetime function is computed at four decile points q0.1, q0.2, q0.4, and q0.6. Table 1 shows the results for the Weibull model which includes the bias (B) and the mean squared error (MSE). Table 2 presents the simulation results for the gamma model.
From these tables I observe the following:
● When n increases, the MSE decreases significantly, which indicates that the α-QPL estimator is consistent.
● As expected by (2.3), for large deciles, the MSE is affected by the censoring percentage.
In a further simulation study, which is summarized in Table 3, the coverage probability (CP) of the CI defined by Theorem 3 is investigated. Weibull and Gamma distributions, each with a set of parameters, were assumed as the underlying models. Three censoring rates of 0.05, 0.20, and 0.30 and three points q0.1, q0.2, and q0.4. In each run, r=5000 repetitions of samples of size n = 50 or 100 were simulated and the CI of the 0.5-QPL at the points considered were calculated. The CP, which shows the ratio of CIs containing the true 0.5-QPL to the mean length (ML) of the CIs, was reproduced. The true values of the 0.5-QPL are also shown in the table for comparison.
The results show that the CI coverage is high and acceptable, but ML coverage increases significantly with time. Moreover, the ML coverage increases slightly with the censoring percentage.
5.
Applications
5.1. Mayo Clinic study on primary biliary cirrhosis data
The dataset of the Mayo Clinic study on primary biliary cirrhosis (PBC) presented by Fleming and Harrington [19] was used. The dataset is also available in the 'survival' library of the famous statistical programming language R and has been analyzed by many authors. I focused on time, i.e. the number of days between registration and the earlier date of death, liver transplantation or analysis of the study in July 1986. More than 60% of the items were censored, resulting in a minimum KM survival function of approximately 0.3534. The KM survival function is plotted in Figure 1 (left). Each censored item was marked with a positive sign in the survival curve.
Three functions 0.25-QPL, 0.5-QPL, and 0.75-QPL, namely the first, second, and third quartiles of the past lifetime, were estimated within the range of the data. The 0.5-QPL function is also referred to the median past lifetime function.
Figures 1 (right) and 2 plot these functions and show that they have increasing form. For the median past lifetime and specially, at two selected times 1077 and 4079, I have q0.5(1077)=480 and q0.5(4079)=2068. This means that I expect half of the patients that experienced the event before 1077 days, experienced it after (before) 1077-480 = 597 days. Similarly, I expect that half of the patients experienced the event before 4079 days, experienced it after (before) 2011 days. In addition, the 95% CIs defined in Theorem 3 are calculated and presented in graphs.
5.2. North central cancer treatment group lung cancer data
This data set contains the survival times of 288 advanced lung cancer patients from the North Central Cancer Treatment Group (NCCTG) and is available in the 'survival' library of the statistical programming language R. The column 'time' of this dataset gives the time to death or censor of the patients and 'status' shows the censoring indicator. About 27% of the death times are censored. Figure 3 shows the KM survival function. Also, the first to third quartile functions 0.25-QPL, 0.5-QPL, and 0.75-QPL are plotted in Figure 3 (right) and Figure 4 and exhibit increasing form.
6.
Conclusions
The α-QPL function is a useful rival of the mean inactivity time function in reliability theory and survival analysis. There are situations where the α-QPL function is preferable to the mean inactivity time function, e.g., when the data are heavily censored or the underlying distribution is skewed or have heavy right tail. The Kaplan-Meier product limit estimator of the survival function was applied to define an estimator for the α-QPL function. The proposed estimator converges weakly to a Gaussian process, and this result is used to construct a confidence interval that depends on the density estimator. Another confidence interval is proposed that does not depend on the density estimator. Simulation results show that the proposed estimator of the α-QPL function is consistent. The coverage probability and mean length of the confidence intervals were also investigated. The applicability of the estimator and the proposed confidence interval is demonstrated using a real dataset. Further properties and applications of the α-QPL function can be considered in the future of this research. In particular, extending the concepts of the α-QPL function and its estimation problem in a multivariate context are interesting and remain as open problems.
Use of AI tools declaration
The author declares he has not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgements
Researchers Supporting Project number (RSP2024R392), King Saud University, Riyadh, Saudi Arabia.
The author thanks the two anonymous reviewers for their careful reading of our manuscript and their many insightful comments and suggestions. This work is supported by Researchers Supporting Project number (RSP2024R392), King Saud University, Riyadh, Saudi Arabia.
Conflict of interest
There are no conflicts of interest.