×

Low-dimensional confounder adjustment and high-dimensional penalized estimation for survival analysis. (English) Zbl 1372.62089

Summary: High-throughput profiling is now common in biomedical research. In this paper we consider the layout of an etiology study composed of a failure time response, and gene expression measurements. In current practice, a widely adopted approach is to select genes according to a preliminary marginal screening and a follow-up penalized regression for model building. Confounders, including for example clinical risk factors and environmental exposures, usually exist and need to be properly accounted for. We propose covariate-adjusted screening and variable selection procedures under the accelerated failure time model. While penalizing the high-dimensional coefficients to achieve parsimonious model forms, our procedure also properly adjust the low-dimensional confounder effects to achieve more accurate estimation of regression coefficients. We establish the asymptotic properties of our proposed methods and carry out simulation studies to assess the finite sample performance. Our methods are illustrated with a real gene expression data analysis where proper adjustment of confounders produces more meaningful results.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62G08 Nonparametric regression and quantile regression
62G20 Asymptotic properties of nonparametric inference
62N02 Estimation in survival analysis and censored data

References:

[1] Cai T, Huang J, Tian L (2009) Regularized estimation for the accelerated failure time model. Biometrics 65:394-404 · Zbl 1274.62736 · doi:10.1111/j.1541-0420.2008.01074.x
[2] Cheng MY, Zhang W, Chen LH (2009) Statistical estimation in generalized multiparameter likelihood models. J Am Stat Assoc 104:1179-1191 · Zbl 1388.62160 · doi:10.1198/jasa.2009.tm08430
[3] Cheng MY, Honda T, Zhang JT (2015) Forward variable selection for sparse ultra-high dimensional varying coefficient models. J Am Stat Assoc. arXiv:1410.6556 · Zbl 1162.62037
[4] Fan J, Li R (2001) Variable selection via noncancave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348-1360 · Zbl 1073.62547 · doi:10.1198/016214501753382273
[5] Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc Ser B 70:849-911 · Zbl 1411.62187 · doi:10.1111/j.1467-9868.2008.00674.x
[6] Gordis L (2008) Epidemiology, 4th edn. Saunders, Philadelphia
[7] Huang J, Ma S, Xie H (2006) Regularized estimation in the accelerated failure time model with high dimensional covariate. Biometrics 62:813-820 · Zbl 1111.62090 · doi:10.1111/j.1541-0420.2006.00562.x
[8] Johnson BA, Lin DY, Zeng D (2008) Penalized estimating functions and variable selection in semiparametric regression models. J Am Stat Assoc 103:672-680 · Zbl 1471.62330 · doi:10.1198/016214508000000184
[9] Li J, Ma S (2010) Interval-censored data with repeated measurements and a cured subgroup. Appl Stat 59:693-705
[10] Lian H, Li J, Tang X (2014) SCAD-penalized regression in additive partially linear proportional hazards models with an ultra-high-dimensional linear part. J Multivar Anal 125:50-64 · Zbl 1359.62130 · doi:10.1016/j.jmva.2013.12.002
[11] Lu Y, Lemon W et al (2006) A gene expression signature predicts survival of subjects with state i non-small cell lung cancer. PLoS Med 3:2229-2243 · doi:10.1371/journal.pmed.0030467
[12] Shao F, Li J, Ma S, Lee M-LT (2014) Semiparametric varying-coefficient model for interval censored data with a cured proportion. Stat Med 33:1700-1712 · doi:10.1002/sim.6054
[13] Stute W (1993) Consistent estimation under random censorship when covariates are present. J Multivar Anal 45:89-103 · Zbl 0767.62036 · doi:10.1006/jmva.1993.1028
[14] VanderWeele TJ, Shpitser I (2013) On the definition of a confounder. Ann Stat 41:196-220 · Zbl 1347.62017 · doi:10.1214/12-AOS1058
[15] Xie Y, Huang J (2009) SCAD-penalized regression in high-dimensional partially linear models. Ann Stat 37:673-696 · Zbl 1162.62037 · doi:10.1214/07-AOS580
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.