Greenwood-based confidence intervals are widely used to quantify uncertainty in survival quantile estimation based on the Kaplan-Meier estimator, and narrow intervals are often interpreted as evidence of stable and reliable inference. However, such numerical precision does not directly address the reproducibility of inferential conclusions under repeated sampling. The relationship between Greenwood-based confidence-interval precision and reproducibility in survival quantile inference is investigated. Reproducibility is quantified using reproducibility probability (RP), defined as the probability that a survival quantile estimate is reproduced within a specified tolerance under repeated sampling, along with its decision-based analogue RP$ (D) $ for two-group survival comparisons. Both measures are estimated via a nonparametric bootstrap framework under a fixed study design and sample size. Extensive simulations are conducted for single-group and two-group settings under exponential, Weibull, and lognormal survival distributions, with independent and dependent right-censoring. The results show that Greenwood-based confidence interval width is not a reliable indicator of reproducibility: Narrow intervals may coexist with low RP, whereas wider intervals may be associated with high RP, depending on the distribution, censoring mechanism, and inferential target. In two-group comparisons, decision reproducibility is driven primarily by the stability of the ordering between group-specific quantiles rather than by the numerical precision of individual estimates, and under dependent censoring, decision reproducibility can be high even when confidence intervals are wide. These findings highlight a fundamental distinction between numerical precision and inferential reproducibility in survival analysis and underscore the need to assess reproducibility alongside conventional confidence-interval reporting.
Citation: Norah D. Alshahrani. On the reproducibility of survival quantile decisions beyond Greenwood-based precision[J]. AIMS Mathematics, 2026, 11(4): 9191-9209. doi: 10.3934/math.2026379
Greenwood-based confidence intervals are widely used to quantify uncertainty in survival quantile estimation based on the Kaplan-Meier estimator, and narrow intervals are often interpreted as evidence of stable and reliable inference. However, such numerical precision does not directly address the reproducibility of inferential conclusions under repeated sampling. The relationship between Greenwood-based confidence-interval precision and reproducibility in survival quantile inference is investigated. Reproducibility is quantified using reproducibility probability (RP), defined as the probability that a survival quantile estimate is reproduced within a specified tolerance under repeated sampling, along with its decision-based analogue RP$ (D) $ for two-group survival comparisons. Both measures are estimated via a nonparametric bootstrap framework under a fixed study design and sample size. Extensive simulations are conducted for single-group and two-group settings under exponential, Weibull, and lognormal survival distributions, with independent and dependent right-censoring. The results show that Greenwood-based confidence interval width is not a reliable indicator of reproducibility: Narrow intervals may coexist with low RP, whereas wider intervals may be associated with high RP, depending on the distribution, censoring mechanism, and inferential target. In two-group comparisons, decision reproducibility is driven primarily by the stability of the ordering between group-specific quantiles rather than by the numerical precision of individual estimates, and under dependent censoring, decision reproducibility can be high even when confidence intervals are wide. These findings highlight a fundamental distinction between numerical precision and inferential reproducibility in survival analysis and underscore the need to assess reproducibility alongside conventional confidence-interval reporting.
| [1] |
E. L. Kaplan, P. Meier, Nonparametric estimation from incomplete observations, J. Am. Stat. Assoc., 53 (1958), 457–481. https://doi.org/10.1080/01621459.1958.10501452 doi: 10.1080/01621459.1958.10501452
|
| [2] | M. Greenwood, The natural duration of cancer, In: Reports on Public Health and Medical Subjects, Ministry of Health, 1926. |
| [3] |
S. N. Goodman, A comment on replication, $p$-values and evidence, Stat. Med., 11 (1992), 875–879. https://doi.org/10.1002/sim.4780110705 doi: 10.1002/sim.4780110705
|
| [4] |
J. Shao, S. C. Chow, Reproducibility probability in clinical trials, Stat. Med., 21 (2002), 1727–1742. https://doi.org/10.1002/sim.1177 doi: 10.1002/sim.1177
|
| [5] |
D. De Martini, Reproducibility probability estimation for testing statistics, Stat. Probab. Lett., 78 (2008), 1056–1061. https://doi.org/10.1016/j.spl.2007.09.064 doi: 10.1016/j.spl.2007.09.064
|
| [6] |
D. D. Boos, L. A. Stefanski, P-value precision and reproducibility, Am. Stat., 65 (2011), 213–221. https://doi.org/10.1198/tas.2011.10129 doi: 10.1198/tas.2011.10129
|
| [7] | T. R. Fleming, D. P. Harrington, Counting processes and survival analysis, New York: John Wiley & Sons, 1991. https://doi.org/10.1002/9781118150672 |
| [8] | P. K. Andersen, Ø. Borgan, R. D. Gill, N. Keiding, Statistical models based on counting processes, New York: Springer-Verlag, 1993. https://doi.org/10.1007/978-1-4612-4348-9 |
| [9] | M. A. Hernán, J. M. Robins, Causal Inference: What If. |
| [10] |
B. Efron, Bootstrap methods: Another look at the jackknife, Ann. Statist., 7 (1979), 1–26. https://doi.org/10.1214/aos/1176344552 doi: 10.1214/aos/1176344552
|
| [11] | B. Efron, R. J. Tibshirani, An introduction to the bootstrap, New York: Chapman & Hall, 1994. https://doi.org/10.1201/9780429246593 |
| [12] | A. C. Davison, D. V. Hinkley, Bootstrap methods and their application, Cambridge University Press, 1997. https://doi.org/10.1017/CBO9780511802843 |