This paper illustrated how nonparametric bootstrap methods for double-censored data can be used to conduct some hypothesis tests, such as quartiles' hypothesis tests. Through simulation studies, the smoothed bootstrap (SB) method performed better results than Efron's method in most scenarios, particularly for small datasets. The SB method provided smaller discrepancies between the actual and nominal error rates.
Citation: Asamh Saleh M. Al Luhayb. Nonparametric bootstrap methods for hypothesis testing in the event of double-censored data[J]. AIMS Mathematics, 2024, 9(2): 4649-4664. doi: 10.3934/math.2024224
Related Papers:
[1]
Neama Salah Youssef Temraz .
Analysis of stress-strength reliability with m-step strength levels under type I censoring and Gompertz distribution. AIMS Mathematics, 2024, 9(11): 30728-30744.
doi: 10.3934/math.20241484
[2]
Hatim Solayman Migdadi, Nesreen M. Al-Olaimat, Maryam Mohiuddin, Omar Meqdadi .
Statistical inference for the Power Rayleigh distribution based on adaptive progressive Type-II censored data. AIMS Mathematics, 2023, 8(10): 22553-22576.
doi: 10.3934/math.20231149
[3]
Hanan Haj Ahmad, Ehab M. Almetwally, Dina A. Ramadan .
A comparative inference on reliability estimation for a multi-component stress-strength model under power Lomax distribution with applications. AIMS Mathematics, 2022, 7(10): 18050-18079.
doi: 10.3934/math.2022994
[4]
Young Eun Jeon, Suk-Bok Kang, Jung-In Seo .
Pivotal-based inference for a Pareto distribution under the adaptive progressive Type-II censoring scheme. AIMS Mathematics, 2024, 9(3): 6041-6059.
doi: 10.3934/math.2024295
[5]
Sofia Tedim, Vera Afreixo, Miguel Felgueiras, Rui Pedro Leitão, Sofia J. Pinheiro, Cristiana J. Silva .
Evaluating COVID-19 in Portugal: Bootstrap confidence interval. AIMS Mathematics, 2024, 9(2): 2756-2765.
doi: 10.3934/math.2024136
[6]
H. M. Barakat, Magdy E. El-Adll, M. E. Sobh .
Bootstrapping $ m $-generalized order statistics with variable rank. AIMS Mathematics, 2022, 7(8): 13704-13732.
doi: 10.3934/math.2022755
[7]
Samah M. Ahmed, Abdelfattah Mustafa .
Estimation of the coefficients of variation for inverse power Lomax distribution. AIMS Mathematics, 2024, 9(12): 33423-33441.
doi: 10.3934/math.20241595
[8]
Wenzhi Zhao, Dou Liu, Huiming Wang .
Sieve bootstrap test for multiple change points in the mean of long memory sequence. AIMS Mathematics, 2022, 7(6): 10245-10255.
doi: 10.3934/math.2022570
[9]
Usanee Janthasuwan, Suparat Niwitpong, Sa-Aat Niwitpong .
Generalized confidence interval for the common coefficient of variation of several zero-inflated Birnbaum-Saunders distributions with an application to wind speed data. AIMS Mathematics, 2025, 10(2): 2697-2723.
doi: 10.3934/math.2025127
[10]
Hongping Guo, Yuhang Qian, Yiran Zhu, Xinming Dai, Xiao Wang .
Confidence intervals for the difference between coefficients of variation of zero-inflated gamma distributions. AIMS Mathematics, 2023, 8(12): 29713-29733.
doi: 10.3934/math.20231521
Abstract
This paper illustrated how nonparametric bootstrap methods for double-censored data can be used to conduct some hypothesis tests, such as quartiles' hypothesis tests. Through simulation studies, the smoothed bootstrap (SB) method performed better results than Efron's method in most scenarios, particularly for small datasets. The SB method provided smaller discrepancies between the actual and nominal error rates.
1.
Introduction
The bootstrap method, as introduced by [1], is a nonparametric statistical method proposed to specify the variability of sample estimates. The method has been widely used in the literature for a variety of statistical problems [2] because it is rather easy to apply and efficient to provide well results. In case of having little information about a suitable distribution, the bootstrap method could be of great practical use [3].
For univariate real-valued data, [1] introduced the bootstrap method and the method is used in several real applications; see [2,3,4] for more presentation details. The method can be described as multiple resamples of size n from the original dataset are created, and the function of interest is computed based on each bootstrap sample. The empirical distribution of the results can be used as a good proxy for the distribution of the function of interest. In the case of finite support, [5] smoothed the bootstrap method by linear interpolation between consecutive observations. Banks' bootstrap method starts with ordering the observations of the original sample and creating n+1 intervals. Each interval is assigned with probability 1n+1. To generate one Banks' bootstrap sample, n intervals are resampled, then one observation is drawn uniformly from each chosen interval; of course, if an interval has been drawn twice or several times, all observations will be drawn from that interval. With Banks' bootstrap method, it is allowed to sample from the whole support and ties do not exist in the bootstrap samples. This is contrary to Efron's method, where the process is restricted to resampling from the original dataset [1]. In the case of underlying distributions with infinite support, [6] generalized Banks' bootstrap method by assuming distribution tail(s) for the last interval(s).
For univariate right-censored data, [7] presented the bootstrap method and the method is widely used for survival analysis; see [5,8,9,10]. This bootstrap version is very similar to the method presented for univariate real-valued data. The method creates multiple bootstrap samples of size n by resampling from the original sample, and the function of interest is computed based on each bootstrap sample. The empirical distribution of those resulting values can be used as a good proxy for the distribution of the function of interest. [11] generalized Banks' bootstrap method based on the right-censoring A(n) assumption [12]. The generalized bootstrap method has been introduced for better results; see [11,13] for more presentation details.
In 1986, [8] introduced the bootstrap method for bivariate data, and the method is built by the same technique used for univariate real-valued data. Multiple bootstrap samples are generated by resampling from the original dataset, and the function of interest is computed based on each bootstrap sample. The empirical distribution of the resulting values can be a good proxy for the distribution of the function of interest. However, Efron's bootstrap method provides poor results for small datasets, and this motivated [14] to propose three smoothed bootstrap (SB) methods. The SB methods are built based on nonparametric predictive inference with parametric and nonparametric copulas and uniform kernels. Those bootstrap methods have been introduced for better results; see [13,14] for more presentation details.
In the literature, the classical statistical methods have been widely used for testing statistical hypotheses; and they are considered as the standard methods although their underlying assumptions are often not met, particularly if the observed dataset is complicated. To avoid the mathematical assumptions, [8,15,16] used Efron's bootstrap method, which is considered as the classical or standard bootstrap method, to test statistical hypotheses. Efron's bootstrap method is easy to implement with good approximation results, but it may provide poor results for small datasets and ties could occur in the bootstrap samples. These drawbacks motivate me to use a smoothed bootstrap (SB) method for double-censored data to test statistical hypotheses and to compare the results to those of Efron's method introduced for double-censored data. Tests based on double-censored data will be considered in this paper.
The rest of the paper is organized as follows: In Section 2, several bootstrap methods for univariate real-valued data, univariate right-censored data and bivariate real-valued data are overviewed. Section 3 introduces Efron's bootstrap technique for data including double-censored observations, along with a SB technique. Through simulations, comparisons are conducted between Efron's bootstrap method and the suggested bootstrap method in terms of computing the Type Ⅰ error rates for quartiles' hypothesis tests, which is presented in Section 4. The final section presents some conclusions and discussions for future research.
2.
Bootstrap methods for different data types
This section presents several bootstrap methods for univariate real-valued data, univariate right-censored data, and bivariate real-valued data.
2.1. Bootstrap methods for univariate real-valued data
This section describes Efron's bootstrap method and Banks' bootstrap method for data including real-valued observations only [1,5]. Let F be a continuous distribution defined on the interval [a,b] and θ(F) be the function of interest. Furthermore, let X1,X2,…,Xn be independent and identically distributed random quantities from the distribution F, and the corresponding observations are x1,x2,…,xn.
Efron's bootstrap method [1] is a nonparametric method proposed to measure the variability of sample estimates. It relies on the empirical distribution function of the original sample. This means that each observation has the same probability to be selected. A large number B of resamples of size n are created by the original sample, then we calculate the function of interest, ˆθ, for each sample. This provides ˆθ1,ˆθ2,…,ˆθB. The empirical distribution of the results ˆθ1,ˆθ2,…,ˆθB approximates the sampling distribution of θ(F). This method is used for hypothesis testing and it provides good results; see [2] for more presentation details.
David L. Banks [5] introduces a SB method for univariate real-valued data. The original data points are ordered as x(1),x(2),…,x(n), then the sample space [a,b] is divided into n+1 intervals by the observations, where the end points x(0) and x(n+1) are equal to a and b, respectively. Each interval (x(i),x(i+1)) for i=0,1,2,…,n is assigned probability 1n+1. To generate a bootstrap sample, we sample intervals n times with replacement, then we draw one observation uniformly from each of those intervals. Based on the bootstrap sample, we calculate the function of interest. This action is repeated B times to create B bootstrap samples and find the statistics ˆθ1,ˆθ2,…,ˆθB. The empirical distribution of the resulting values ˆθ1,ˆθ2,…,ˆθB approximates the sampling distribution of θ(F). In the reference of [13], Banks' bootstrap method is used for hypothesis testing and it is compared to Efron's bootstrap method. From the comparisons, Banks' bootstrap method provides better results than those of Efron's bootstrap method, especially for small datasets.
2.2. Bootstrap methods for univariate right-censored data
This section presents Efron's bootstrap method [7] and the SB method for right-censored data [9,10,11,13]. Let T1,T2,…,Tn be independent and identically distributed event random variables from a distribution F supported on R+ and let C1,C2,…,Cn be independent and identically distributed right-censored random variables from a distribution G supported on R+. Furthermore, let (X1,D1),(X2,D2),…,(Xn,Dn) be the right-censored random variables, where each pair can be derived by
Xi=min(Ti,Ci),
(2.1)
Di={1ifTi≤Ci(uncensored),0ifTi>Ci(censored),
(2.2)
where i=1,2,…,n. Let (x1,d1),(x2,d2),…,(xn,dn) be the observations of the corresponding random quantities (X1,D1),(X2,D2),…,(Xn,Dn) and θ(F) is the function of interest, where this function can be estimated by θ(ˆF).
Bradley Efron [7] proposed a nonparametric bootstrap method for data including right-censored observations. This bootstrap technique is nearly identical to the method proposed for univariate real-valued data. The empirical distribution function of the original sample is used, so that each observation has a probability 1n, regardless to the observation type whether it is event or censored. A large number of B bootstrap samples of size n are generated by sampling with replacement from the original dataset, then the function of interest based on each bootstrap sample is calculated. These steps result the values ˆθ1,ˆθ2,…,ˆθB, where the empirical distribution of the values ˆθ1,ˆθ2,…,ˆθB can be a good estimate for the sampling distribution of θ(F). This bootstrap method is used to test equality of average lifetimes over two populations [17] and it provides good results.
The SB method for right-censored data is presented by [9,10,11,13]. This method is a generalization of Banks' bootstrap method for right-censored data, and it is built based on the generalization of A(n) assumption proposed for data including right-censored observations by [12]. To implement this bootstrap method, the data support is partitioned into n+1 intervals by the original data, and the right-censored A(n) assumption is used to assign specific probabilities to those intervals. For one bootstrap sample, we resample n intervals with the assignment probabilities, and from each interval, we sample one observation. An iteration of these steps B times creates B bootstrap samples, then the function of interest is computed based on each bootstrap sample. The values ˆθ1,ˆθ2,…,ˆθB will be derived, and the empirical distribution of ˆθ1,ˆθ2,…,ˆθB will be used to estimate the sampling distribution of θ(F). [13] used the SB method for hypothesis testing and compared it to Efron's bootstrap method. From the comparisons, the SB method provides better results than those of Efron's bootstrap method, especially for small and medium datasets.
The SB method and Efron's bootstrap method for right-censored data differ in several aspects. First, Efron's bootstrap method uses the empirical distribution to create the bootstrap samples while the SB method uses the right-censored A(n) assumption. Second, Efron's bootstrap samples often include ties and right-censored observations due to the resampling process, and this is not the case with the SB samples. Third, Efron's method resamples from the original sample; so no observation will be out of the data, contrary to the SB method, which allows it to sample from the whole data range.
2.3. Bootstrap methods for bivariate real-valued data
This section presents Efron's bootstrap method [8] and three SB methods for bivariate real-valued data [13,14]. Let the random quantities (Xi,Yi)∈R2, for i=1,2,…,n, be independent and identically distributed with distribution H, and let the observation corresponding to (Xi,Yi) be denoted by (xi,yi). Furthermore, let the function of interest be θ(H), which can be estimated by θ(ˆH).
Bradley Efron and Robert Tibshirani [8] used the empirical distribution to implement the bootstrap. Multiple bootstrap samples, e.g., B=1000, of size n are created by resampling with equal probability from the observed data, and based on each bootstrap sample, the function of interest is calculated. This leads to B resulting values, and the empirical distribution of these B values is used as a proxy for the distribution of the function of interest; this is the same basic idea as for univariate real-valued data. Several references use this bootstrap method for hypothesis testing; see [18,19,20] for more details.
In [13,14], the authors introduced a SB method based on a semi-parametric predictive method proposed by [21,22]. The semi-parametric predictive method divides the sample space into (n+1)2 squares by the original sample and it assigns a certain probability to each square. To create one bootstrap sample, n squares are resampled with the assignment probabilities, and from each chosen square, we sample one observation. This technique is repeated multiple times, e.g., B=1000, and based on each bootstrap sample, the function of interest is calculated. This leads to B resulting values, and the empirical distribution of these B values is used to estimate the distribution of the function of interest.
In [13,14], the authors introduced another SB method based on a nonparametric predictive method, which is proposed by [22,23]. The nonparametric predictive method partitions the sample space into (n+1)2 squares by the observed data points and it assigns a certain probability to each square. To create one bootstrap sample, n squares are resampled with the assignment probabilities, and from each chosen square, we sample one observation. This technique is repeated multiple times, e.g., B=1000, and based on each bootstrap sample, the function of interest is calculated. This leads to B resulting values, and the empirical distribution of these B values is a good estimate for the distribution of the function of interest.
Also in [13,14], the authors introduced a smoothed Efron's bootstrap method based on uniform kernels, and this method is the third SB method. Each data point is surrounded by a block of size bX×bY and the observation is located in the center of its corresponding block. To create one bootstrap sample, n blocks are sampled with replacement, and from each chosen block, we sample one observation uniformly. This procedure is iterated multiple times, e.g., B=1000, and based on each bootstrap sample, the function of interest is calculated. This leads to B resulting values, and the empirical distribution of these B values is a good estimate for the distribution of the function of interest.
3.
Bootstrap methods for double-censored data
This section introduces two bootstrap techniques for double-censored data. Let T1,T2,…,Tn be independent and identically distributed event random variables from a distribution G supported on R+ and let RC1,RC2,…,RCn be independent and identically distributed right-censored random variables from a distribution H supported on R+. Furthermore, let LC1,LC2,…,LCn be independent and identically distributed left-censored random variables from a distribution F supported on R+ and let (X1,D1),(X2,D2),…,(Xn,Dn) be the double-censored random variables, where the pairs (X1,D1),(X2,D2),…,(Xn,Dn) are derived by
Let (x1,d1),(x2,d2),…,(xn,dn) be the observations of the corresponding random quantities (X1,D1),(X2,D2),…,(Xn,Dn) and θ(G) is the function of interest, where this function can be estimated by θ(ˆG).
3.1. Efron's bootstrap method
Based on Efron's bootstrap methods proposed for univariate real-valued data, univariate right-censored data and bivariate real-valued data, Efron's bootstrap method can be generalized for univariate double-censored data by using the empirical distribution function. The empirical function gives a probability 1n to each observation with no regards to the observation type.
3.1.1. Efron's bootstrap algorithm
Efron's bootstrap technique for double-censored data can be illustrated in steps as follows:
(ⅰ) Resample n pairs (xi,di) from the original data. This step leads to create one bootstrap sample, which is referred to by
Sample∗boot={(x∗1,d∗1),(x∗2,d∗2),…,(x∗n,d∗n)}.
(ⅱ) Compute the function of interest ˆθ∗=ˆθ(Sample∗boot) using the self-consistency algorithm [24].
(ⅲ) Perform steps (ⅰ) and (ⅱ) B times; this will provide ˆθ∗1,ˆθ∗2,…,ˆθ∗B.
3.1.2. Practical notes
Two notes are important to be pointed out when applying Efron's bootstrap technique.
(ⅰ) Efron's bootstrap datasets have to contain censored observations and ties due to the resampling process. The double-censored observations and ties may cause some complications in computations, especially for small samples and large censoring proportions.
(ⅱ) Efron's bootstrap technique for double-censored data will be reduced to Efron's bootstrap technique proposed for univariate real-valued data if the censoring proportion in the original sample is zero.
3.2. The SB method
For data including only event observations, [25,26] introduced the A(n) assumption, which provides a partial probability distribution for one future observation Xn+1 using the original sample. The support is partitioned, based on the original sample, into n+1 intervals, and each interval has a probability 1n+1, where x(0)=−∞ and x(n+1)=+∞ (for positive random variables; x(0)=0 and x(n+1)=+∞).
This A(n) assumption is generalized for the case of data including double-censored observations by [27], and this version is called by the double-censored A(n) assumption. The double-censored A(n) assumption divides the support by the original data points into n+1 intervals, and the assumption assigns certain probabilities to those created intervals. Let X1,X2,…,Xn be exchangeable and positive random quantities, and suppose that ties occur with probability zero; for simplicity. Let x1,x2,…,xn be the observations corresponding to the random variables X1,X2,…,Xn, where the observations include u event times, t(1)<t(2)<…<t(u), v right-censored times, rc(1)<rc(2)<…<rc(v), and k=n−(u+v) left-censored times, lc(1)<lc(2)<…<lc(k). Furthermore, let the support limits be t(0) and t(u+1), which are equal to 0 and ∞, respectively. The probabilities for the next future observation Xn+1 to be in the intervals (t(i),t(i+1)), (rc(j),trc(j)), and (tlc(w),lc(w)) can be computed by the following functions:
where i=0,1,2,…,u, j=1,2,…,v, and w=1,2,…,k, with t(0)=0 and t(u+1)=∞. I(.) is the indicator function, trc(j) is the first event time greater than rc(j), and tlc(w) is the first event time less than lc(w).
Several notes should be pointed out to deeply demonstrate the double-censored A(n) assumption. First, each event interval has a probability greater than or equal 1n+1 because the censored observations' probabilities are divided and added to the event intervals. Second, any right-censored interval has a probability less than or equal to 1n+1 because its own probability is spread forward to the observations greater than that right-censored observation. Third, any left-censored interval has a probability less than or equal to 1n+1 because its own probability is spread backward to the observations less than that left-censored observation. Lastly, if data includes event observations only, the double-censored A(n) assumption will return to the original A(n) assumption.
3.2.1. The SB algorithm
To develop a SB technique for data including double-censored observations, the double-censored A(n) assumption will be used; and the SB technique algorithm is as follows:
(ⅰ) Divide the sample space into n+1 intervals in the form of (t(i),t(i+1)), (rc(j),trc(j)), and (tlc(w),lc(w)) and compute their assignment probabilities by Eq (3.3).
(ⅱ) Resample n intervals with the assignment probabilities.
(ⅲ) From each finite interval, draw one observation uniformly. For the case of infinite interval (x(i),∞), where the limit x(i) could be a right-censored observation or an event observation, an exponential tail is assumed with rate parameter λ(i) based on the corresponding assignment probabilities, then sample one observation from the tail for the interval (x(i),∞).
(ⅳ) Compute the function of interest ˆθ∗.
(ⅴ) Steps from (ⅱ) to (ⅳ) should be performed B times. This repetition will generate B bootstrap datasets along with their functions of interest.
3.2.2. Practical notes
Three practical notes should be listed to have a wide picture of the SB method.
(ⅰ) One of the SB technique's advantages is that the sampling will be from the whole data support, and this prevents ties to occur.
(ⅱ) The SB datasets' observations are all event, and this eases the calculations to find the functions of interest. This is contrary to Efron's bootstrap datasets, which often contain ties and censored observations due to the resampling process.
(ⅲ) The SB technique will return to Banks' bootstrap technique proposed for event data in case of zero censoring proportion in the original dataset.
In this paper, ties are assumed to not exist in the dataset, but in the real world applications; however, this is not always the case. We may experience seven cases of ties which are: tied left-censoring times, tied event times, tied right-censoring times, ties among event and left-censoring times, ties among event and right-censoring times, ties among event and left-censoring times with right-censoring times, and ties among left-censoring and right-censoring times. With the first three cases, the tied observations are split by adding a tiny number to those ties. With the fourth case, we assume that the left-censoring times occur before the event observations. With the fifth case, the right-censoring times are assumed to occur after the event observations, and this assumption is widely proposed in the literature [28,29]. With the sixth case, the right-censoring times are assumed to occur after the event observations and the left-censoring times are assumed to occur before the event times. With the last case, the right-censoring times are assumed to occur after the left-censoring times.
4.
Comparison with Efron's bootstrap method
In this section, the SB method is compared to Efron's bootstrap method proposed for double-censored data through simulation studies. The comparisons are in terms of computing the Type Ⅰ error rates of quartiles' hypothesis tests. Through the simulations, we consider different scenarios by using Eqs (3.1) and (3.2), and all scenarios are presented in Table 1 with the distributions' parameters and censoring proportions. For each scenario, we consider three distributions; the first one is used to create event times, the second one is used to create right-censored times, and the third one is used to create left-censored times. To generate one double-censored data from any scenario, n observations are generated from each distribution of that scenario, then Eqs (3.1) and (3.2) are used. The references [13,30] are good to help in determining the censoring proportions.
Table 1.
The density functions for the distributions used in each scenario to create double-censored data.
To conduct comparisons between Efron's bootstrap method and the SB method, we generate N=1000 datasets from each scenario proposed in Table 1. For each generated dataset, we apply the methods B=1000 times. This leads to having 1000 bootstrap samples based on each method. We then compute the quartile of interest at each bootstrap sample, and from the resulting values, we can define the 100(1−2α)% bootstrap confidence interval for the quartile. We count one if the value of the quartile specified in the null hypothesis is not included in the confidence interval; otherwise, we count zero. We repeat this procedure for all N=1000 generated datasets, then count the number of times the null hypothesis was rejected over the 1000 trials. This ratio will be the Type Ⅰ error rate of the quartile's hypothesis test with significance level 2α.
Tables 2-19 present the Type Ⅰ error rates for the quartiles' hypothesis tests with significance levels 0.10 and 0.05. In scenarios 1 and 3, where the censoring proportion is 20% (10% for right-censored observations and 10% for left-censored observations), Efron's bootstrap method makes the discrepancies between the nominal and actual error rates for the first quartile larger than those of the SB method, especially for small datasets. This can be seen in Tables 2 and 8. As the censoring proportion increases to 30% (15% for right-censored observations and 15% for left-censored observations), which is indicated in scenarios 2 and 4, Efron's bootstrap method provides poor results for the first quartile even with large sample sizes, and the results are presented in Tables 5 and 11. It is perhaps because more left-censored times occur at the beginning and the self-consistency algorithm provides under estimates. With scenarios 5 and 6, where the censoring proportions are 32.5% and 29%, respectively, Efron's bootstrap method continues providing poor results as it appears in Tables 14 and 17. In contrast, the SB method mostly provides well results as it makes the discrepancies between the nominal and actual error rates for the first quartile small, regardless to the censoring proportions; this could be considered as one of the advantages over Efron's method.
Table 2.
Type Ⅰ error rates with significance level 2α=0.10 and 0.05 for H0:Q1=0.2207 based on the SB method and Efron's method (first scenario).
Tables 3, 6, 9, 12, 15, and 18 present the Type Ⅰ error rates for the second quartiles hypothesis tests with significance levels 0.10 and 0.05 in all scenarios with different censoring proportion. It seems that Efron's bootstrap method provides larger discrepancies between the actual and nominal error rates in comparison to those of the SB method when n=20. As the sample size increases, Efron's bootstrap method improves the outcomes. The SB method with all different sample sizes mostly leads to better results. It makes the discrepancies between the actual and nominal error rates smaller than those of Efron's method regardless to the censoring proportions.
In Tables 4, 7, 10, 13, 16, and 19, the Type Ⅰ error rates for the third quartile hypothesis tests are presented with significance levels 0.10 and 0.05. When the sample size is 20 and 40 in all scenarios with different censoring proportions, the SB method mostly provides better results than Efron's method. As the sample size increases, the method still provides good results, but Efron's method is better. This could be because the third quartile located in the tail and exponential tails proposed for the infinite intervals are pulled in due to the assignment probabilities; therefore, they do not cover the true third quartile.
5.
Conclusions
In this paper, a SB method for double-censored data has been used for hypothesis testing. It is easy to implement and efficient to provide well results. Through simulation studies, the SB method has been compared to Efron's bootstrap method for double-censored data in terms of computing the Type Ⅰ error rates for quartiles' hypothesis tests. The SB technique mostly performs better in comparison to Efron's technique, in particular when the data size is small and medium. The bootstrap samples created by Efron's bootstrap method contained ties and double-censored observations due to the restriction of resampling process. This could lead to complications in computations and poor results, specifically for the events of small datasets and large censoring proportions. These disadvantages can be avoided by the SB method, which generates only event observations for the bootstrap samples with no ties by using the double-censored A(n) assumption [27].
The implementation of the SB method in R software required nearly 15% more time than Efron's bootstrap method. The SB method consumed more time because we ordered the observations and created the n+1 intervals first, then we computed the probabilities corresponding to those intervals. After these steps, we drew observations from the intervals to generate the bootstrap samples. These steps led to consume more time when applying the SB technique in R software.
A problem often experienced by applied researchers is the analysis of time to event data. Such data examples can be in public health, engineering, economics, medicine, biology, epidemiology, and demography. The SB method and Efron's method are applicable to all these disciplines and the focus can be on applications of the techniques to biology and medicine because they usually include double-censored data, for example, duration of response to treatment, time to recurrence of a disease, time to development of a disease, analyzing data on the time to death from a certain cause, or simply time to death.
Due to the results obtained from the application of the SB method, it is good to investigate the approach for testing and computing Type Ⅱ error rates through simulation studies. Also, it is believed that the SB method can provide well outcomes for survival and reliability inferences. All these topics will be left for future research.
Use of AI tools declaration
The author declares he has not used Artificial Intelligence (AI) tools in the creation of this article.
Acknowledgements
The researcher would like to thank the Deanship of Scientific Research, Qassim University for funding the publication of this project.
Conflict of interest
The author states that there is no conflict of interest.
References
[1]
B. Efron, Bootstrap methods: another look at the jackknife, Ann. Statist., 7 (1979), 1–26. https://doi.org/10.1214/aos/1176344552 doi: 10.1214/aos/1176344552
[2]
B. Efron, R. J. Tibshirani, An introduction to the bootstrap, Chapman and Hall, 1993.
D. Berrar, Introduction to the non-parametric bootstrap, Encycl. Bioinf. Comput. Biol., 1 (2019), 766–773. https://doi.org/10.1016/B978-0-12-809633-8.20350-6 doi: 10.1016/B978-0-12-809633-8.20350-6
[5]
D. L. Banks, Histospline smoothing the bayesian bootstrap, Biometrika, 75 (1988), 673–684. https://doi.org/10.2307/2336308 doi: 10.2307/2336308
[6]
F. P. A. Coolen, S. B. Himd, Nonparametric predictive inference bootstrap with application to reproducibility of the two-sample Kolmogorov-Smirnov test, J. Stat. Theory Pract., 14 (2020), 26. https://doi.org/10.1007/s42519-020-00097-5 doi: 10.1007/s42519-020-00097-5
[7]
B. Efron, Censored data and the bootstrap, J. Amer. Stat. Assoc., 76 (1981), 312–319. https://doi.org/10.2307/2287832 doi: 10.2307/2287832
[8]
B. Efron, R. Tibshirani, Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy, Statist. Sci., 1 (1986), 54–77. https://doi.org/10.1214/ss/1177013815 doi: 10.1214/ss/1177013815
[9]
A. S. M. Al Luhayb, F. P. A. Coolen, T. Coolen-Maturi, Generalizing banks' smoothed bootstrap method for right-censored data, Proceedings of the 29th European Safety and Reliability Conference, Hannover, Germany, 2019,894–901. https://doi.org/10.3850/978-981-11-2724-3_0177-cd
[10]
A. S. M. Al Luhayb, T. Coolen-Maturi, F. P. A. Coolen, Smoothed bootstrap for survival function inference, 2019 International Conference on Information and Digital Technologies (IDT), Zilina, Slovakia, 2019,296–303. https://doi.org/10.1109/DT.2019.8813347
[11]
A. S. M. Al Luhayb, F. P. A. Coolen, T. Coolen-Maturi, Smoothed bootstrap for right-censored data, Commun. Stat.-Theory Methods, 2023, 1–25. https://doi.org/10.1080/03610926.2023.2171708 doi: 10.1080/03610926.2023.2171708
[12]
F. P. A. Coolen, K. J. Yan, Nonparametric predictive inference with right-censored data, J. Stat. Plan. Infer., 126 (2004), 25–54. https://doi.org/10.1016/j.jspi.2003.07.004 doi: 10.1016/j.jspi.2003.07.004
[13]
A. S. M. Al Luhayb, Smoothed bootstrap methods for right-censored data and bivariate data, Ph.D. Thesis, Durham University, 2021.
[14]
A. S. M. Al Luhayb, T. Coolen-Maturi, F. P. A. Coolen, Smoothed bootstrap methods for bivariate data, J. Stat. Theory Pract., 17 (2023), 37. https://doi.org/10.1007/s42519-023-00334-7 doi: 10.1007/s42519-023-00334-7
[15]
J. L. Rasmussen, Estimating correlation coefficients: bootstrap and parametric approaches, Psychol. Bull., 101 (1987), 136–139. https://doi.org/10.1037/0033-2909.101.1.136 doi: 10.1037/0033-2909.101.1.136
[16]
M. J. Strube, Bootstrap Type Ⅰ error rates for the correlation coefficient: an examination of alternate procedures, Psychol. Bull., 104 (1988), 290–292. https://doi.org/10.1037/0033-2909.104.2.290 doi: 10.1037/0033-2909.104.2.290
M. Dolker, S. Halperin, D. R. Divgi, Problems with bootstrapping pearson correlations in very small bivariate samples, Psychometrika, 47 (1982), 529–530. https://doi.org/10.1007/BF02293714 doi: 10.1007/BF02293714
[19]
J. G. MacKinnon, Bootstrap hypothesis testing, In: D. A. Belsley, E. J. Kontoghiorghes, Handbook of computational econometrics, 2009,183–213. https://doi.org/10.1002/9780470748916.ch6
T. Coolen-Maturi, F. P. A. Coolen, N. Muhammad, Predictive inference for bivariate data: combining nonparametric predictive inference for marginals with an estimated copula, J. Stat. Theory Pract., 10 (2016), 515–538. https://doi.org/10.1080/15598608.2016.1184112 doi: 10.1080/15598608.2016.1184112
[22]
N. Muhammad, Predictive inference with copulas for bivariate data, Ph.D. Thesis, Durham University, UK, 2016.
[23]
N. Muhammad, F. P. A. Coolen, T. Coolen-Maturi, Predictive inference for bivariate data with nonparametric copula, AIP Conf. Proc., 1750 (2016), 060004. https://doi.org/10.1063/1.4954609 doi: 10.1063/1.4954609
[24]
J. P. Klein, M. L. Moeschberger, Survival analysis: techniques for censored and truncated data, New York: Springer, 2003. https://doi.org/10.1007/b97377
[25]
B. M. Hill, Posterior distribution of percentiles: Bayes' theorem for sampling from a population, J. Amer. Stat. Assoc., 63 (1968), 677–691. https://doi.org/10.1080/01621459.1968.11009286 doi: 10.1080/01621459.1968.11009286
[26]
B. M. Hill, De finetti's theorem, induction, and A(n) or bayesian nonparametric predictive inference (with discussion), In: J. M. Bernardo, M. H. DeGroot, D. V. Lindley, A. F. M. Smith, Bayesian statistics, Oxford University Press, 3 (1988), 211–241.
[27]
A. S. M. Al Luhayb, Nonparametric statistical method for prediction in case of data including double-censored observations, Pak. J. Statist., 39 (2023), 485–500.
[28]
L. M. Berliner, B. M. Hill, Bayesian nonparametric survival analysis, J. Amer. Stat. Assoc., 83 (1988), 772–779. https://doi.org/10.1080/01621459.1988.10478660 doi: 10.1080/01621459.1988.10478660
[29]
E. L. Kaplan, P. Meier, Nonparametric estimation from incomplete observations, J. Amer. Stat. Assoc., 53 (1958), 457–481. https://doi.org/10.1080/01621459.1958.10501452 doi: 10.1080/01621459.1958.10501452
[30]
F. Wan, Simulating survival data with predefined censoring rates for proportional hazards models, Stat. Med., 36 (2017), 838–854. https://doi.org/10.1002/sim.7178 doi: 10.1002/sim.7178
Asamh Saleh M. Al Luhayb. Nonparametric bootstrap methods for hypothesis testing in the event of double-censored data[J]. AIMS Mathematics, 2024, 9(2): 4649-4664. doi: 10.3934/math.2024224
Asamh Saleh M. Al Luhayb. Nonparametric bootstrap methods for hypothesis testing in the event of double-censored data[J]. AIMS Mathematics, 2024, 9(2): 4649-4664. doi: 10.3934/math.2024224