Marginal semiparametric multivariate accelerated failure time model with generalized estimating equations

1370 Accesses
21 Citations
Explore all metrics

Abstract

The semiparametric accelerated failure time (AFT) model is not as widely used as the Cox relative risk model due to computational difficulties. Recent developments in least squares estimation and induced smoothing estimating equations for censored data provide promising tools to make the AFT models more attractive in practice. For multivariate AFT models, we propose a generalized estimating equations (GEE) approach, extending the GEE to censored data. The consistency of the regression coefficient estimator is robust to misspecification of working covariance, and the efficiency is higher when the working covariance structure is closer to the truth. The marginal error distributions and regression coefficients are allowed to be unique for each margin or partially shared across margins as needed. The initial estimator is a rank-based estimator with Gehan’s weight, but obtained from an induced smoothing approach with computational ease. The resulting estimator is consistent and asymptotically normal, with variance estimated through a multiplier resampling method. In a large scale simulation study, our estimator was up to three times as efficient as the estimateor that ignores the within-cluster dependence, especially when the within-cluster dependence was strong. The methods were applied to the bivariate failure times data from a diabetic retinopathy study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparison of variance estimation methods in semiparametric accelerated failure time models for multivariate failure time data

Article 26 June 2021

Accelerated failure time model with quantile information

Article 31 May 2015

Robust estimation in accelerated failure time models

Article 13 February 2018

References

Brown BM, Wang Y-G (2005) Standard errors and covariance matrices for smoothed rank estimators. Biometrika 92(1):149–158
Article MathSciNet MATH Google Scholar
Brown BM, Wang Y-G (2007) Induced smoothing for rank regression with censored survival times. Stat Med 26(4):828–836
Article MathSciNet Google Scholar
Buckley J, James I (1979) Linear regression with censored data. Biometrika 66:429–436
Article MATH Google Scholar
Chiou SH, Kang S, Yan J (2013) Fast accelerated failure time modeling for case-cohort data. Stat Comput. doi:10.1007/s11222-013-9388-2
Cox DR (1972) Regression models and life-tables (with discussion). J R Stat Soc 34:187–220
MATH Google Scholar
Diabetic Retinopathy Study Research Group (1976) Preliminary report on effects of photocoagulation therapy. Am J Ophthalmol 81(4):383–396
Google Scholar
Gehan EA (1965) A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52:203–223
Article MathSciNet MATH Google Scholar
Halekoh U, Højsgaard S (2006) The R package geepack for generalized estimating equations. J Stat Softw 15(2):1–11
Google Scholar
Hornsteiner U, Hamerle A (1996) A combined GEE/Buckley–James method for estimating an accelerated failure time model of multivariate failure times. Discussion paper 47, Ludwig-Maximilians-Universität München, Collaborative Research Center 386. http://epub.ub.uni-muenchen.de/1444/. Accessed 12 Feb 2014
Huang Y (2002) Calibration regression of censored lifetime medical cost. J Am Stat Assoc 97(457):318–327
Article MATH Google Scholar
Huster WJ, Brookmeyer R, Self SG (1989) Modelling paired survival data with covariates. Biometrics 45:145–156
Article MathSciNet MATH Google Scholar
Jin Z, Lin DY, Wei LJ, Ying Z (2003) Rank-based inference for the accelerated failure time model. Biometrika 90(2):341–353
Article MathSciNet MATH Google Scholar
Jin Z, Lin DY, Ying Z (2006a) On least-squares regression with censored data. Biometrika 93(1):147–161
Article MathSciNet MATH Google Scholar
Jin Z, Lin DY, Ying Z (2006b) Rank regression analysis of multivariate failure time data based on marginal linear models. Scand J Stat 33(1):1–23
Article MathSciNet MATH Google Scholar
Johnson LM, Strawderman RL (2009) Induced smoothing for the semiparametric accelerated failure time model: asymptotics and extensions to clustered data. Biometrika 96(3):577–590
Article MathSciNet MATH Google Scholar
Komárek A, Lesaffre E, Hilton JF (2005) Accelerated failure time model for arbitrarily censored data with smoothed error distribution. J Comput Graph Stat 14(3):726–745
Article Google Scholar
Lai TL, Ying Z (1991) Large sample theory of a modified Buckley–James estimator for regression analysis with censored data. Ann Stat 19:1370–1402
Article MathSciNet MATH Google Scholar
Lee EW, Wei LJ, Ying Z (1993) Linear regression analysis for highly stratified failure time data. J Am Stat Assoc 88:557–565
Article MathSciNet MATH Google Scholar
Li H, Yin G (2009) Generalized method of moments estimation for linear regression with clustered failure time data. Biometrika 96(2):293–306
Article MathSciNet MATH Google Scholar
Liang K-Y, Self SG, Chang Y-C (1993) Modelling marginal hazards in multivariate failure time data. J R Stat Soc 55:441–453
MathSciNet MATH Google Scholar
Liang K-Y, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73:13–22
Article MathSciNet MATH Google Scholar
Luo X, Huang C-Y (2011) Analysis of recurrent gap time data using the weighted risk set method and the modified within-cluster resampling method. Stat Med 30(4):301–311
Article MathSciNet Google Scholar
Novák P (2013) Goodness-of-fit test for the accelerated failure time model based on martingale residuals. Kybernetika 49:40–59
MathSciNet MATH Google Scholar
Prentice RL (1978) Linear rank tests with right censored data (Corr: V70 p304). Biometrika 65:167–180
Article MathSciNet MATH Google Scholar
Qu A, Lindsay BG, Li B (2000) Improving generalised estimating equations using quadratic inference functions. Biometrika 87(4):823–836
Article MathSciNet MATH Google Scholar
Ritov Y (1990) Estimation in a linear regression model with censored data. Ann Stat 18:303–328
Article MathSciNet MATH Google Scholar
Robins JM, Rotnitzky A (1992) Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell N, Dietz K, Farewell V (eds) AIDS epidemiology—methodological issues. Birkhäuser, Boston, pp 297–331
Google Scholar
Spiekerman CF, Lin DY (1996) Checking the marginal Cox model for correlated failure time data. Biometrika 83:143–156
Article MathSciNet MATH Google Scholar
Strawderman RL (2005) The accelerated gap times model. Biometrika 92(3):647–666
Article MathSciNet MATH Google Scholar
Stute W (1993) Consistent estimation under random censorship when covariables are present. J Multivar Anal 45:89–103
Article MathSciNet MATH Google Scholar
Stute W (1996) Distributional convergence under random censorship when covariables are present. Scand J Stat 23(4):461–471
MathSciNet MATH Google Scholar
Tsiatis AA (1990) Estimating regression parameters using linear rank tests for censored data. Ann Stat 18:354–372
Article MathSciNet MATH Google Scholar
Wang M-C, Chang S-H (1999) Nonparametric estimation of a recurrent survival function. J Am Stat Assoc 94:146–153
Article MathSciNet MATH Google Scholar
Wang Y-G, Fu L (2011) Rank regression for accelerated failure time model with clustered and censored data. Comput Stat Data Anal 55(7):2334–2343
Article MathSciNet Google Scholar
Yan J, Fine J (2004) Estimating equations for association structures (Pkg: P859–880). Stat Med 23(6):859–874
Article Google Scholar
Ying Z (1993) A large sample study of rank estimation for censored regression data. Ann Stat 21:76–99
Article MATH Google Scholar
Yu L (2011) Nonparametric quasi-likelihood for right censored data. Lifetime Data Anal 17:594–607
Article MathSciNet MATH Google Scholar
Yu L, Peace KE (2012) Spline nonparametric quasi-likelihood regression within the frame of the accelerated failure time model. Comput Stat Data Anal 56:2675–2687
Article MathSciNet MATH Google Scholar
Zeng D, Lin D (2007) Efficient estimation for the accelerated failure time model. J Am Stat Assoc 102(480):1387–1396
Article MathSciNet MATH Google Scholar
Zhou M (1992) $M$-estimation in censored linear models. Biometrika 79:837–841
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Statistics, University of Minnesota, Duluth, Duluth, MN, USA
Sy Han Chiou
Department of Applied Statistics, Yonsei University, Seoul, Korea
Sangwook Kang
Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA
Junghi Kim
Department of Statistics, University of Connecticut, Storrs, CT, USA
Jun Yan
Center for Public Health and Health Policy Research, University of Connecticut Health Center, East Hartford, CT, USA
Jun Yan
Center for Environmental Sciences & Engineering, University of Connecticut, Storrs, CT, USA
Jun Yan

Authors

Sy Han Chiou
View author publications
You can also search for this author in PubMed Google Scholar
Sangwook Kang
View author publications
You can also search for this author in PubMed Google Scholar
Junghi Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jun Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Yan.

Appendix

1.1 Sketches of the Proofs

We impose the following regularity conditions:

A1:
$\Vert X_i\Vert \le B$ for all $i = 1, \ldots , n$ and some nonrandom constant $B$, where $\Vert \cdot \Vert $ is matrix norm.
A2:
The density function of $F_{k, \beta }$ exists such that $\int _{-\infty }^\infty t^2{\mathrm {d}}F_{k, \beta }(t) < \infty $, for $k=1, \ldots , K$.
A3:
The distribution function $F_{k, \beta }$ is twice differentiable with density $f_{k, \beta }$ such that
$$\begin{aligned} \int \limits _{-\infty }^\infty \left( \frac{f_{k, \beta }^\prime (t)}{f_{k, \beta }(t)}\right) ^2 {\mathrm {d}}F_{k, \beta }(t) < \infty \end{aligned}$$
where $1 \le k \le K$, and both $f_{k, \beta }(t)$ and $f^\prime _{k, \beta }(t)$ are bounded functions.
A4:
$E[\exp (\theta \epsilon _{ik}^-)]+ \sup _{k\in \{1, \ldots , K\}} E[\exp (\theta C_{ik}^- )] < \infty $ for some $\theta > 0$, where $a^-=|a|I_{\{a\le 0\}}$.
A5:
$\sup _{| b | < \infty ; -\infty < t < \infty }\sum _{i=1}^n\sum _{k=1}^K \Pr (t \le C_{ik} - X_{ik}^\top b \le t +h) = O(nh)$ as $h \rightarrow 0$ and $nh \rightarrow \infty $.
A6:
As $n\rightarrow \infty $, $\hat{\alpha }_n$ is bounded and is $n^{1/2}$ consistent to $\alpha _0$ given $\beta $.
A7:
As $n\rightarrow \infty $, initial estimator $b_n$ is $n^{1/2}$ consistent to $\beta _0$ and $\sqrt{n}( b_n - \beta _0)$ is asymptoticly normal with zero mean.
A8:
The slope matrices $n^{-1} \partial U_n / \partial \beta $ and $n^{-1} \partial U_n / \partial b$ evaluated at $(\beta _0, \beta _0, \alpha _0)$ converge to nondegenerate, finite limit $A$ and $B$, respectively.
A9:
The derivative $\partial \Omega _i^{-1}(\alpha ) / \partial \alpha $ is finite for all $i = 1, 2, \ldots n$.

Conditions A1–A5 are standard and ensure the existence of the solution of Eq. (2) (Lai and Ying 1991). It is natural to assume that the working covariance matrix $\Omega $ in Eq. (4) is a symmetric positive definite matrix. Then there exist a $K\times K$ nonsingular matrix, $\Gamma $, such that $\Omega (\alpha _0) = \Gamma ^{1/2} \Gamma ^{1/2}$. Let $\mathbb X_i = \Gamma ^{-1/2} X_i$, $\mathbb T_i = \Gamma ^{-1/2} Y_i$, $\mathbb C_i = \Gamma ^{-1/2} C_i$, and $\omega _i = \Gamma ^{-1/2} \epsilon _i$. Then Eq. (4) evaluated at $\alpha = \alpha _0$ can be viewed as Eq. (2) with the transformed data $\mathbb X_i$ and $\mathbb Y_i = \min (\mathbb Y_i, \mathbb C_i)$, with error $\omega _i$, $i = 1, \ldots , n$. The existence of the solution to Eq. (4) can be verified by the same arguments as in Lai and Ying (1991), with assumptions similar to A1 to A5 on the transformed data. The consistency and asymptotic normality of the estimator given $\alpha = \alpha _0$ follow from the same arguments as in Jin et al. (2006a).

The extra complexity here comes from the fact that Eq. (4) is solved at $\alpha = \hat{\alpha }_n$, an estimator of $\alpha _0$. Under condition A9, the $i$th term in the summation of $\partial U_n / \partial \alpha $ evaluated at $(\beta _0, \beta _0, \alpha _0)$ is a linear function of $\hat{Y}_i(\beta _0)-X_i^\top \beta _0$, $i = 1, \ldots , n$, with expectation zero. By the law of large number, $n^{-1}\partial U_n/\partial \alpha $ evaluated at $(\beta _0, \beta _0, \alpha _0)$ converges to zero in probability.

1.1.1 Proof of Theorem 1

At the solution $\hat{\beta }_n^{(1)}$ given $b_n$ and $\hat{\alpha }_n$, we have $n^{-1} U_n(\hat{\beta }_n^{(1)}, b_n, \hat{\alpha }_n) = 0$. Taylor expansion at $(\beta _0, \beta _0, \alpha _0)$ gives

$$\begin{aligned} 0&= \frac{1}{n}U_{n}(\beta _0, \beta _0, \alpha _0) + \frac{1}{n} \frac{\partial }{\partial \beta }\left[ U_n(\beta _0, \beta _0, \alpha _0) \right] (\hat{\beta }_n^{(1)}-\beta _0)\nonumber \\&+\, \frac{1}{n}\frac{\partial }{\partial b}\left[ U_n(\beta _0, \beta _0, \alpha _0) \right] (b_n-\beta _0) + \frac{1}{n} \frac{\partial }{\partial \alpha }\left[ U_n(\beta _0, \beta _0, \alpha _0) \right] (\hat{\alpha }_n-\alpha _0)\nonumber \\&+\,\, o_p(n^{-1/2}) \nonumber \\&= \frac{1}{n}U_n(\beta _0, \beta _0, \alpha _0) + A_n (\hat{\beta }_n^{(1)}-\beta _0) +B_n (b_n-\beta _0)+C_n(\hat{\alpha }_n-\alpha _0) + o_p(n^{-1/2}).\nonumber \\ \end{aligned}$$

(10)

With regularity conditions A1–A5, the first term converges in probability to zero by the law of large number. The convergence of $b_n$ and $\alpha _n$ in A6 and A7, combined with the limit condition in A8 and A9, then gives consistency of $\hat{\beta }_n^{(1)}$ to $\beta _0$. By induction, $\hat{\beta }^{(m)}_n$ is consistent for $\beta _0$ at every $m$.

1.1.2 Proof of Theorem 2

Under regularity conditions $\sqrt{n}(\hat{\beta }_n^{(1)}-\beta _0)$ can be expressed as

$$\begin{aligned}&\sqrt{n}(\hat{\beta }_n^{(1)}-\beta _0)\nonumber \\&\quad =\left[ A_n\right] ^{-1}\left[ \frac{1}{\sqrt{n}} U_n(\beta _0, \beta _0, \alpha _0)+B_n\sqrt{n}(b_n-\beta _0)+ C_n\sqrt{n}(\hat{\alpha }_n - \alpha _0)\right] + o_p(1).\nonumber \\ \end{aligned}$$

(11)

With condition A9, $C_n$ converges to zero in probability, and, hence, with $\sqrt{n}$ consistency of $\hat{\alpha }_n$, $C_n \sqrt{n} (\hat{\alpha }_n - \alpha _0) = o_p(1)$. Equation (11) is then asymptotically equivalent to

$$\begin{aligned} \left[ A_n\right] ^{-1}\left[ \frac{1}{\sqrt{n}} U_n(\beta _0, \beta _0, \alpha _0)+B_n\sqrt{n}(b_n-\beta _0)\right] . \end{aligned}$$

With the assumption that $b_n-\beta _0$ is asymptoticly normal, there exist some nonrandom functions $\eta _i$ with zero mean such that,

$$\begin{aligned} \sqrt{n}(b_n - \beta _0) = n^{-1/2}\sum _{i=1}^n\eta _i + o_p\left( \Vert b_n-\beta _0\Vert \right) . \end{aligned}$$

On the other hand, $U_n(\beta _0, \beta _0, \alpha _0)$ is a sum of independent and identically distributed quantities with zero mean, denoted by $\phi _i$’s, $i = 1, \ldots , n$. Equation (11) reduces to

$$\begin{aligned} \sqrt{n}(\hat{\beta }_n^{(1)}-\beta _0) = \left[ A_n\right] ^{-1}\left[ n^{-1/2}\sum _{i=1}^n \left( \phi _i+B_n\eta _i\right) \right] + o_p\left( \Vert b_n-\beta _0\Vert \right) . \end{aligned}$$

By multivariate central limit theorem for sums of independent random vectors, the asymptotic distribution for $\hat{\beta }_n^{(1)}$ is zero mean multivariate normal as $n\rightarrow \infty $. The limit covariance matrix $\Sigma $ have the form $A^{-1}\Phi A^{-1}$, where $\Phi = \lim _{n\rightarrow \infty }n^{-1}\sum _{i=1}^n \imath _i \imath _i^{\top }$ with $\imath _i = \phi _i + B\eta _i$. Induction then implies that $\hat{\beta }_n^{(m)}$ is multivariate normal for every $m$.

1.2 Analytic details of $W(t, x)$ in model checking

Using arguments similar to those in Jin et al. (2006a) and Novák (2013), $W(t, x)$ can be shown to have the same asymptotic distribution as $\hat{W}(t, x)$, where

$$\begin{aligned} \hat{W}(t, x)&= n^{-1/2}\sum _{i=1}^n\sum _{k=1}^K\int \limits _0^t \\&\quad \left( \omega _{ik}(x) - \frac{\sum _{j=1}^n\sum _{l=1}^K \omega _{jl}(x) I[e_{jl}(\hat{\beta }_n^{(m)})\ge u]}{\sum _{j=1}^n\sum _{l{=}1}^KI[e_{jl}(\hat{\beta }_n^{(m)}\ge u])}\right) {\mathrm {d}}\hat{M}_{ik}(u, \hat{\beta }_n^{(m)}) (Z_i {-}1)\\&\quad -n^{1/2}\left( \hat{f}_n(t, x) + \int \limits _0^t\hat{f}_Y(u, x){\mathrm {d}}\hat{\Lambda }(u, \hat{\beta }_n^{(m)})\right) ^\top (\hat{\beta }_n^{(m)}-\hat{\beta }_n^{(m)*})\\&\quad -n^{{-}1/2}\int \limits _0^t\sum _{i{=}1}^n\sum _{k{=}1}^K\omega _{ik}(x)I[e_{ik}(\hat{\beta }_n^{(m)}){\ge } u]{\mathrm {d}}\left( \hat{\Lambda }(u, \hat{\beta }_n^{(m)}) {-} \hat{\Lambda }(u, \hat{\beta }_n^{(m)*})\right) \!, \end{aligned}$$

where $\hat{f}_N(t, x) = n^{-1}\sum _{i=1}^n\sum _{k=1}^K\Delta _{ik}\omega _{ij}(x) \hat{f}_0(t) X_{ik}$, $\hat{f}_Y(t, x) = n^{-1}\sum _{i=1}^n$ $ \sum _{k=1}^K \omega _{ik}(x) \hat{g}_0(t) X_{ik}$, and $f_0(t)$ and $g_0(t)$ are the baseline densities of $\epsilon _{ik}$ and $e_{ik}(\beta _0)$, respectively, with kernel estimate $\hat{f}_0(t)$ and $\hat{g}_0(t)$ (e.g., Novák 2013), obtained with $\beta _0$ replaced with $\hat{\beta }_{n}^{(m)}$. Note that the multipliers $Z_i$’s used to obtain the bootstrap samples $\hat{\beta }_{n}^{(m)*}$ are used again here.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chiou, S.H., Kang, S., Kim, J. et al. Marginal semiparametric multivariate accelerated failure time model with generalized estimating equations. Lifetime Data Anal 20, 599–618 (2014). https://doi.org/10.1007/s10985-014-9292-x

Download citation

Received: 20 March 2013
Accepted: 28 January 2014
Published: 19 February 2014
Issue Date: October 2014
DOI: https://doi.org/10.1007/s10985-014-9292-x

Marginal semiparametric multivariate accelerated failure time model with generalized estimating equations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparison of variance estimation methods in semiparametric accelerated failure time models for multivariate failure time data

Accelerated failure time model with quantile information

Robust estimation in accelerated failure time models

References

Author information

Authors and Affiliations

Corresponding author

Appendix

1.1 Sketches of the Proofs

1.1.1 Proof of Theorem 1

1.1.2 Proof of Theorem 2

1.2 Analytic details of \(W(t, x)\) in model checking

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Marginal semiparametric multivariate accelerated failure time model with generalized estimating equations

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparison of variance estimation methods in semiparametric accelerated failure time models for multivariate failure time data

Accelerated failure time model with quantile information

Robust estimation in accelerated failure time models

References

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

1.1 Sketches of the Proofs

1.1.1 Proof of Theorem 1

1.1.2 Proof of Theorem 2

1.2 Analytic details of \(W(t, x)\) in model checking

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation