Abstract
The semiparametric accelerated failure time (AFT) model is not as widely used as the Cox relative risk model due to computational difficulties. Recent developments in least squares estimation and induced smoothing estimating equations for censored data provide promising tools to make the AFT models more attractive in practice. For multivariate AFT models, we propose a generalized estimating equations (GEE) approach, extending the GEE to censored data. The consistency of the regression coefficient estimator is robust to misspecification of working covariance, and the efficiency is higher when the working covariance structure is closer to the truth. The marginal error distributions and regression coefficients are allowed to be unique for each margin or partially shared across margins as needed. The initial estimator is a rank-based estimator with Gehan’s weight, but obtained from an induced smoothing approach with computational ease. The resulting estimator is consistent and asymptotically normal, with variance estimated through a multiplier resampling method. In a large scale simulation study, our estimator was up to three times as efficient as the estimateor that ignores the within-cluster dependence, especially when the within-cluster dependence was strong. The methods were applied to the bivariate failure times data from a diabetic retinopathy study.
Similar content being viewed by others
References
Brown BM, Wang Y-G (2005) Standard errors and covariance matrices for smoothed rank estimators. Biometrika 92(1):149–158
Brown BM, Wang Y-G (2007) Induced smoothing for rank regression with censored survival times. Stat Med 26(4):828–836
Buckley J, James I (1979) Linear regression with censored data. Biometrika 66:429–436
Chiou SH, Kang S, Yan J (2013) Fast accelerated failure time modeling for case-cohort data. Stat Comput. doi:10.1007/s11222-013-9388-2
Cox DR (1972) Regression models and life-tables (with discussion). J R Stat Soc 34:187–220
Diabetic Retinopathy Study Research Group (1976) Preliminary report on effects of photocoagulation therapy. Am J Ophthalmol 81(4):383–396
Gehan EA (1965) A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52:203–223
Halekoh U, Højsgaard S (2006) The R package geepack for generalized estimating equations. J Stat Softw 15(2):1–11
Hornsteiner U, Hamerle A (1996) A combined GEE/Buckley–James method for estimating an accelerated failure time model of multivariate failure times. Discussion paper 47, Ludwig-Maximilians-Universität München, Collaborative Research Center 386. http://epub.ub.uni-muenchen.de/1444/. Accessed 12 Feb 2014
Huang Y (2002) Calibration regression of censored lifetime medical cost. J Am Stat Assoc 97(457):318–327
Huster WJ, Brookmeyer R, Self SG (1989) Modelling paired survival data with covariates. Biometrics 45:145–156
Jin Z, Lin DY, Wei LJ, Ying Z (2003) Rank-based inference for the accelerated failure time model. Biometrika 90(2):341–353
Jin Z, Lin DY, Ying Z (2006a) On least-squares regression with censored data. Biometrika 93(1):147–161
Jin Z, Lin DY, Ying Z (2006b) Rank regression analysis of multivariate failure time data based on marginal linear models. Scand J Stat 33(1):1–23
Johnson LM, Strawderman RL (2009) Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data. Biometrika 96(3):577–590
Komárek A, Lesaffre E, Hilton JF (2005) Accelerated failure time model for arbitrarily censored data with smoothed error distribution. J Comput Graph Stat 14(3):726–745
Lai TL, Ying Z (1991) Large sample theory of a modified Buckley–James estimator for regression analysis with censored data. Ann Stat 19:1370–1402
Lee EW, Wei LJ, Ying Z (1993) Linear regression analysis for highly stratified failure time data. J Am Stat Assoc 88:557–565
Li H, Yin G (2009) Generalized method of moments estimation for linear regression with clustered failure time data. Biometrika 96(2):293–306
Liang K-Y, Self SG, Chang Y-C (1993) Modelling marginal hazards in multivariate failure time data. J R Stat Soc 55:441–453
Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22
Luo X, Huang C-Y (2011) Analysis of recurrent gap time data using the weighted risk set method and the modified within-cluster resampling method. Stat Med 30(4):301–311
Novák P (2013) Goodness-of-fit test for the accelerated failure time model based on martingale residuals. Kybernetika 49:40–59
Prentice RL (1978) Linear rank tests with right censored data (Corr: V70 p304). Biometrika 65:167–180
Qu A, Lindsay BG, Li B (2000) Improving generalised estimating equations using quadratic inference functions. Biometrika 87(4):823–836
Ritov Y (1990) Estimation in a linear regression model with censored data. Ann Stat 18:303–328
Robins JM, Rotnitzky A (1992) Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell N, Dietz K, Farewell V (eds) AIDS epidemiology—methodological issues. Birkhäuser, Boston, pp 297–331
Spiekerman CF, Lin DY (1996) Checking the marginal Cox model for correlated failure time data. Biometrika 83:143–156
Strawderman RL (2005) The accelerated gap times model. Biometrika 92(3):647–666
Stute W (1993) Consistent estimation under random censorship when covariables are present. J Multivar Anal 45:89–103
Stute W (1996) Distributional convergence under random censorship when covariables are present. Scand J Stat 23(4):461–471
Tsiatis AA (1990) Estimating regression parameters using linear rank tests for censored data. Ann Stat 18:354–372
Wang M-C, Chang S-H (1999) Nonparametric estimation of a recurrent survival function. J Am Stat Assoc 94:146–153
Wang Y-G, Fu L (2011) Rank regression for accelerated failure time model with clustered and censored data. Comput Stat Data Anal 55(7):2334–2343
Yan J, Fine J (2004) Estimating equations for association structures (Pkg: P859–880). Stat Med 23(6):859–874
Ying Z (1993) A large sample study of rank estimation for censored regression data. Ann Stat 21:76–99
Yu L (2011) Nonparametric quasi-likelihood for right censored data. Lifetime Data Anal 17:594–607
Yu L, Peace KE (2012) Spline nonparametric quasi-likelihood regression within the frame of the accelerated failure time model. Comput Stat Data Anal 56:2675–2687
Zeng D, Lin D (2007) Efficient estimation for the accelerated failure time model. J Am Stat Assoc 102(480):1387–1396
Zhou M (1992) \(M\)-estimation in censored linear models. Biometrika 79:837–841
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
1.1 Sketches of the Proofs
We impose the following regularity conditions:
-
A1:
\(\Vert X_i\Vert \le B\) for all \(i = 1, \ldots , n\) and some nonrandom constant \(B\), where \(\Vert \cdot \Vert \) is matrix norm.
-
A2:
The density function of \(F_{k, \beta }\) exists such that \(\int _{-\infty }^\infty t^2{\mathrm {d}}F_{k, \beta }(t) < \infty \), for \(k=1, \ldots , K\).
-
A3:
The distribution function \(F_{k, \beta }\) is twice differentiable with density \(f_{k, \beta }\) such that
$$\begin{aligned} \int \limits _{-\infty }^\infty \left( \frac{f_{k, \beta }^\prime (t)}{f_{k, \beta }(t)}\right) ^2 {\mathrm {d}}F_{k, \beta }(t) < \infty \end{aligned}$$where \(1 \le k \le K\), and both \(f_{k, \beta }(t)\) and \(f^\prime _{k, \beta }(t)\) are bounded functions.
-
A4:
\(E[\exp (\theta \epsilon _{ik}^-)]+ \sup _{k\in \{1, \ldots , K\}} E[\exp (\theta C_{ik}^- )] < \infty \) for some \(\theta > 0\), where \(a^-=|a|I_{\{a\le 0\}}\).
-
A5:
\(\sup _{| b | < \infty ; -\infty < t < \infty }\sum _{i=1}^n\sum _{k=1}^K \Pr (t \le C_{ik} - X_{ik}^\top b \le t +h) = O(nh)\) as \(h \rightarrow 0\) and \(nh \rightarrow \infty \).
-
A6:
As \(n\rightarrow \infty \), \(\hat{\alpha }_n\) is bounded and is \(n^{1/2}\) consistent to \(\alpha _0\) given \(\beta \).
-
A7:
As \(n\rightarrow \infty \), initial estimator \(b_n\) is \(n^{1/2}\) consistent to \(\beta _0\) and \(\sqrt{n}( b_n - \beta _0)\) is asymptoticly normal with zero mean.
-
A8:
The slope matrices \(n^{-1} \partial U_n / \partial \beta \) and \(n^{-1} \partial U_n / \partial b\) evaluated at \((\beta _0, \beta _0, \alpha _0)\) converge to nondegenerate, finite limit \(A\) and \(B\), respectively.
-
A9:
The derivative \(\partial \Omega _i^{-1}(\alpha ) / \partial \alpha \) is finite for all \(i = 1, 2, \ldots n\).
Conditions A1–A5 are standard and ensure the existence of the solution of Eq. (2) (Lai and Ying 1991). It is natural to assume that the working covariance matrix \(\Omega \) in Eq. (4) is a symmetric positive definite matrix. Then there exist a \(K\times K\) nonsingular matrix, \(\Gamma \), such that \(\Omega (\alpha _0) = \Gamma ^{1/2} \Gamma ^{1/2}\). Let \(\mathbb X_i = \Gamma ^{-1/2} X_i\), \(\mathbb T_i = \Gamma ^{-1/2} Y_i\), \(\mathbb C_i = \Gamma ^{-1/2} C_i\), and \(\omega _i = \Gamma ^{-1/2} \epsilon _i\). Then Eq. (4) evaluated at \(\alpha = \alpha _0\) can be viewed as Eq. (2) with the transformed data \(\mathbb X_i\) and \(\mathbb Y_i = \min (\mathbb Y_i, \mathbb C_i)\), with error \(\omega _i\), \(i = 1, \ldots , n\). The existence of the solution to Eq. (4) can be verified by the same arguments as in Lai and Ying (1991), with assumptions similar to A1 to A5 on the transformed data. The consistency and asymptotic normality of the estimator given \(\alpha = \alpha _0\) follow from the same arguments as in Jin et al. (2006a).
The extra complexity here comes from the fact that Eq. (4) is solved at \(\alpha = \hat{\alpha }_n\), an estimator of \(\alpha _0\). Under condition A9, the \(i\)th term in the summation of \(\partial U_n / \partial \alpha \) evaluated at \((\beta _0, \beta _0, \alpha _0)\) is a linear function of \(\hat{Y}_i(\beta _0)-X_i^\top \beta _0\), \(i = 1, \ldots , n\), with expectation zero. By the law of large number, \(n^{-1}\partial U_n/\partial \alpha \) evaluated at \((\beta _0, \beta _0, \alpha _0)\) converges to zero in probability.
1.1.1 Proof of Theorem 1
At the solution \(\hat{\beta }_n^{(1)}\) given \(b_n\) and \(\hat{\alpha }_n\), we have \(n^{-1} U_n(\hat{\beta }_n^{(1)}, b_n, \hat{\alpha }_n) = 0\). Taylor expansion at \((\beta _0, \beta _0, \alpha _0)\) gives
With regularity conditions A1–A5, the first term converges in probability to zero by the law of large number. The convergence of \(b_n\) and \(\alpha _n\) in A6 and A7, combined with the limit condition in A8 and A9, then gives consistency of \(\hat{\beta }_n^{(1)}\) to \(\beta _0\). By induction, \(\hat{\beta }^{(m)}_n\) is consistent for \(\beta _0\) at every \(m\).
1.1.2 Proof of Theorem 2
Under regularity conditions \(\sqrt{n}(\hat{\beta }_n^{(1)}-\beta _0)\) can be expressed as
With condition A9, \(C_n\) converges to zero in probability, and, hence, with \(\sqrt{n}\) consistency of \(\hat{\alpha }_n\), \(C_n \sqrt{n} (\hat{\alpha }_n - \alpha _0) = o_p(1)\). Equation (11) is then asymptotically equivalent to
With the assumption that \(b_n-\beta _0\) is asymptoticly normal, there exist some nonrandom functions \(\eta _i\) with zero mean such that,
On the other hand, \(U_n(\beta _0, \beta _0, \alpha _0)\) is a sum of independent and identically distributed quantities with zero mean, denoted by \(\phi _i\)’s, \(i = 1, \ldots , n\). Equation (11) reduces to
By multivariate central limit theorem for sums of independent random vectors, the asymptotic distribution for \(\hat{\beta }_n^{(1)}\) is zero mean multivariate normal as \(n\rightarrow \infty \). The limit covariance matrix \(\Sigma \) have the form \(A^{-1}\Phi A^{-1}\), where \(\Phi = \lim _{n\rightarrow \infty }n^{-1}\sum _{i=1}^n \imath _i \imath _i^{\top }\) with \(\imath _i = \phi _i + B\eta _i\). Induction then implies that \(\hat{\beta }_n^{(m)}\) is multivariate normal for every \(m\).
1.2 Analytic details of \(W(t, x)\) in model checking
Using arguments similar to those in Jin et al. (2006a) and Novák (2013), \(W(t, x)\) can be shown to have the same asymptotic distribution as \(\hat{W}(t, x)\), where
where \(\hat{f}_N(t, x) = n^{-1}\sum _{i=1}^n\sum _{k=1}^K\Delta _{ik}\omega _{ij}(x) \hat{f}_0(t) X_{ik}\), \(\hat{f}_Y(t, x) = n^{-1}\sum _{i=1}^n\) \( \sum _{k=1}^K \omega _{ik}(x) \hat{g}_0(t) X_{ik}\), and \(f_0(t)\) and \(g_0(t)\) are the baseline densities of \(\epsilon _{ik}\) and \(e_{ik}(\beta _0)\), respectively, with kernel estimate \(\hat{f}_0(t)\) and \(\hat{g}_0(t)\) (e.g., Novák 2013), obtained with \(\beta _0\) replaced with \(\hat{\beta }_{n}^{(m)}\). Note that the multipliers \(Z_i\)’s used to obtain the bootstrap samples \(\hat{\beta }_{n}^{(m)*}\) are used again here.
Rights and permissions
About this article
Cite this article
Chiou, S.H., Kang, S., Kim, J. et al. Marginal semiparametric multivariate accelerated failure time model with generalized estimating equations. Lifetime Data Anal 20, 599–618 (2014). https://doi.org/10.1007/s10985-014-9292-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-014-9292-x