Predicting disease risks by matching quantiles estimation for censored data

Peng Wu; Baosheng Liang; Yifan Xia; Xingwei Tong; Peng Wu; Baosheng Liang; Yifan Xia; Xingwei Tong

doi:10.3934/mbe.2020251

Mathematical Biosciences and Engineering

2020, Volume 17, Issue 5: 4544-4562. doi: 10.3934/mbe.2020251

Previous Article Next Article

Research article Special Issues

Predicting disease risks by matching quantiles estimation for censored data

1.
School of Statistics, Beijing Normal University, Beijing 100875, China
2.
Department of Biostatistics, School of Public Health, Peking University, Beijing 100191, China
3.
Institute of Medical Technology, Peking University, Beijing 100191, China

Received: 29 March 2020 Accepted: 18 June 2020 Published: 29 June 2020

In time to event data analysis, it is often of interest to predict quantities such as t-year survival rate or the survival function over a continuum of time. A commonly used approach is to relate the survival time to the covariates by a semiparametric regression model and then use the fitted model for prediction, which usually results in direct estimation of the conditional hazard function or the conditional estimating equation. Its prediction accuracy, however, relies on the correct specification of the covariate-survival association which is often difficult in practice, especially when patient populations are heterogeneous or the underlying model is complex. In this paper, from a prediction perspective, we propose a disease-risk prediction approach by matching an optimal combination of covariates with the survival time in terms of distribution quantiles. The proposed method is easy to implement and works flexibly without assuming a priori model. The redistribution-of-mass technique is adopted to accommodate censoring. We establish theoretical properties of the proposed method. Simulation studies and a real data example are also provided to further illustrate its practical utilities.
- censored data,
- matching quantiles estimation,
- redistribution of mass,
- survival prediction
Citation: Peng Wu, Baosheng Liang, Yifan Xia, Xingwei Tong. Predicting disease risks by matching quantiles estimation for censored data[J]. Mathematical Biosciences and Engineering, 2020, 17(5): 4544-4562. doi: 10.3934/mbe.2020251

Related Papers:

Abstract

In time to event data analysis, it is often of interest to predict quantities such as t-year survival rate or the survival function over a continuum of time. A commonly used approach is to relate the survival time to the covariates by a semiparametric regression model and then use the fitted model for prediction, which usually results in direct estimation of the conditional hazard function or the conditional estimating equation. Its prediction accuracy, however, relies on the correct specification of the covariate-survival association which is often difficult in practice, especially when patient populations are heterogeneous or the underlying model is complex. In this paper, from a prediction perspective, we propose a disease-risk prediction approach by matching an optimal combination of covariates with the survival time in terms of distribution quantiles. The proposed method is easy to implement and works flexibly without assuming a priori model. The redistribution-of-mass technique is adopted to accommodate censoring. We establish theoretical properties of the proposed method. Simulation studies and a real data example are also provided to further illustrate its practical utilities.

References

[1]	J. Buckley, I. James, Linear regression with censored data, Biometrika, 66 (1979), 429-436.
[2]	L. J. Wei, The accelerated failure time model: A useful alternative to the Cox regression model in survival analysis, Stat. Med., 11 (1992), 1871-1879.
[3]	S. C. Cheng, L. J. Wei, Z. Ying, Analysis of transformation models with censored data, Biometrika, 82 (1995), 835-845.
[4]	Z. Jin, D. Y. Lin, L. J. Wei, Z. Ying, Rank based inference for the accelerated failure time model, Biometrika, 90 (2003), 341-353.
[5]	D.R. Cox, Regression models and life-tables (with discussion), J. R. Stat. Soc., 34 (1972), 187-220.
[6]	D. Y. Lin, Z. Ying, Semiparametric analysis of the additive risk model, Biometrika, 81 (1994), 61-71.
[7]	D. Zeng, D. Lin, Maximum likelihood estimation in semiparametric regression models with censored data, J. R. Stat. Soc., 69 (2007), 507-564.
[8]	S. Portnoy, Censored regression quantiles, J. Am. Stat. Assoc., 98 (2003), 1001-1012.
[9]	R. Koenker, Censored quantile regression redux, J. Stat. Software, 27 (2008), 1-25.
[10]	H. J. Wang, L. Wang, Locally weighted censored quantile regression, J. Am. Stat. Assoc., 104 (2009), 1117-1128.
[11]	R. Henderson, N. Keiding, Individual survival time prediction using statistical models, J. of Med. Ethics, 31 (2005), 703-706.
[12]	Z. Karian, E. Dudewicz, Fitting the generalized Lambda distribution to data: A method based on percentiles, Commun. Stat. Simul. Comput., 28 (1999), 793-819.
[13]	Y. Dominicy, D. Veredas, The method of simulated quantiles, J. Econometrics, 172 (2013), 235-247.
[14]	N. Sgouropoulos, Q. Yao, C. Yastremiz, Matching a distribution by matching quantiles estimation, J. Am. Stat. Assoc., 110 (2015), 742-759.
[15]	R. Koenker, K. F. Hallock, Quantile Regression, J. Econ. Perspect. 15 (2001), 143-156.
[16]	H. Zou, M. Yuan, Composite quantile regression and the oracle model selection theory, Ann. Stat., 36 (2008), 1108-1126.
[17]	A. K. Han, Non-parametric analysis of a generalized regression model: The maximum rank correlation estimator J. Econometrics, 35 (1987), 303-316.
[18]	R. P. Sherman, The limiting distribution of the maximum rank correlation estimator, Econometrica, 61 (1993), 123-137.
[19]	S. Khan, E. Tamer, Partial rank estimation of duration models with general forms of censoring, J. Econometrics, 136 (2007), 251-280.
[20]	B. Efron, The two sample problem with censored data, In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, 1967, 831-853. Available from: http://www.med.mcgill.ca/epidemiology/hanley/bios601/SurvivalAnalysis/Efron1967.pdf.
[21]	D. M. Dabrowska, Uniform consistency of the kernel conditional Kaplan-Meier estimate, Ann. Stat., 17 (1989), 1157-1167.
[22]	J. Kiefer, Deviations between the sample quantile process and the sample df, Nonparametric Tech. Stat. Inference, 1970 (1970), 299-319.
[23]	B. M. Brown, Y. G. Wang, Induced smoothing for rank regression with censored survival times, Stat. Med., 26 (2007), 828-836.
[24]	Z. Jin, D. Y. Lin, Z. Ying, On least-squares regression with censored data, Biometrika, 93 (2006), 147-161.
[25]	J. D. Kalbfleisch, R. L. Prentice, The Statistical Analysis of Failure Time Data, John Wiley & Sons, New York, 2011.
[26]	W. Gonzalez-Manteiga, C. Cadarso-Suarez, Asymptotic properties of a generalized Kaplan-Meier estimator with some applications, J. Nonparametric Stat., 4 (1994), 65-78.
[27]	C. L. Leng, X. W. Tong, Censored quantile regression via Box-Cox transformation under conditional independence, Stat. Sinica, 24 (2014), 221-249.

Reader Comments

Your name:*

Email:*
© 2020 the Author(s), licensee AIMS Press. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0)