×

Boosting multi-state models. (English) Zbl 1356.65030

Summary: One important goal in multi-state modelling is to explore information about conditional transition-type-specific hazard rate functions by estimating influencing effects of explanatory variables. This may be performed using single transition-type-specific models if these covariate effects are assumed to be different across transition-types. To investigate whether this assumption holds or whether one of the effects is equal across several transition-types (cross-transition-type effect), a combined model has to be applied, for instance with the use of a stratified partial likelihood formulation. Here, prior knowledge about the underlying covariate effect mechanisms is often sparse, especially about ineffectivenesses of transition-type-specific or cross-transition-type effects. As a consequence, data-driven variable selection is an important task: a large number of estimable effects has to be taken into account if joint modelling of all transition-types is performed. A related but subsequent task is model choice: is an effect satisfactory estimated assuming linearity, or is the true underlying nature strongly deviating from linearity? This article introduces component-wise Functional Gradient Descent Boosting (short boosting) for multi-state models, an approach performing unsupervised variable selection and model choice simultaneously within a single estimation run. We demonstrate that features and advantages in the application of boosting introduced and illustrated in classical regression scenarios remain present in the transfer to multi-state models. As a consequence, boosting provides an effective means to answer questions about ineffectiveness and non-linearity of single transition-type-specific or cross-transition-type effects.

MSC:

65C60 Computational problems in statistics (MSC2010)
62-04 Software, source code, etc. for problems pertaining to statistics
62P10 Applications of statistics to biology and medical sciences; meta analysis
Full Text: DOI

References:

[1] Akaike, H.; Petrov, BN (ed.); Csaki, F. (ed.), Information theory and an extension of the maximum likelihood principle, 267-281 (1973), Budapest · Zbl 0283.62006
[2] Allignol A, Beyersmann J, Schumacher M (2008) mvna: An R package for the Nelson-Aalen estimator in multistate models. R News 8(2):48-50
[3] Andersen PK, Pohar Perme M (2008) Inference for outcome probabilities in multi-state models. Lifetime Data Anal 14(4):405-431 · Zbl 1302.62226 · doi:10.1007/s10985-008-9097-x
[4] Andersen PK, Borgan O, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer Series in Statistics. Springer, Berlin · Zbl 0769.62061
[5] Beyersmann J, Schumacher M, Allignol A (2012) Competing Risks and Multistate Models with R. Springer Series “UseR!” · Zbl 1304.62002
[6] Bøvelstad HM, Nygård S, Størvold HL, Aldrin M, Borgan Ø, Frigessi A, Lingjærde OC (2007) Predicting survival from microarray data-a comparative study. Bioinformatics 23(16):2080-2087 · doi:10.1093/bioinformatics/btm305
[7] Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization. Prediction and model fitting. Stat Sci 22(4):477-505 with discussion · Zbl 1246.62163 · doi:10.1214/07-STS242
[8] Commenges D (1999) Multi-state models in epidemiology. Lifetime Data Anal 5:315-327 · Zbl 0941.62117 · doi:10.1023/A:1009636125294
[9] Cox DR (1972) Regression models and life-tables. J R Stat Soc Ser B (Methodological) 34(2):187-220 · Zbl 0243.62041
[10] de Wreede LC, Fiocco M, Putter H (2011) mstate: An R package for the analysis of competing risks and multi-state models. J Stat Softw 38(7):1-30
[11] Eilers PHC, Marx BD (1996) Flexible smoothing with B-splines and penalties. Stat Sci 11(2):89-121 · Zbl 0955.62562 · doi:10.1214/ss/1038425655
[12] Fahrmeir L, Kneib T, Lang S, Marx B (2013) Regression: models, methods and applications. Springer, Berlin · Zbl 1276.62046 · doi:10.1007/978-3-642-34333-9
[13] Goeman JJ (2010) L1 penalized estimation in the cox proportional hazards model. Biom J 52(1):70-84 · Zbl 1207.62185
[14] Hastie TJ, Tibshirani RJ (1990) Generalized additive models, vol 43. CRC Press, Boca Raton · Zbl 0747.62061
[15] Hofner B, Hothorn T, Kneib T, Schmid M (2011) A framework for unbiased model selection based on boosting. J Comput Gr Stat 20(4):956-971 · doi:10.1198/jcgs.2011.09220
[16] Hofner B, Hothorn T, Kneib T (2013) Variable selection and model choice in structured survival models. Comput Stat 28(3):1079-1101 · Zbl 1305.65043 · doi:10.1007/s00180-012-0337-x
[17] Hothorn T, Bühlmann P, Kneib T, Schmid M, Hofner B (2014) mboost: Model-based boosting. R add-on package published online on the Comprehensive R Archive Network, R package version 2.4-0 · Zbl 1231.62071
[18] Kneib T, Hothorn T, Tutz G (2009) Variable selection and model choice in geoadditive regression models. Biometrics 65(2):626-634 · Zbl 1167.62096 · doi:10.1111/j.1541-0420.2008.01112.x
[19] Putter H, Van Houwelingen HC (2011) Frailties in multi-state models: are they identifiable? Do we need them? Stat Methods Med Res. doi:10.1177/0962280211424665
[20] Putter H, Fiocco M, Geskus RB (2007) Tutorial in biostatistics: competing risks and multi-state models. Stat Med 26(11):2389-2430 · doi:10.1002/sim.2712
[21] R Development Core Team (2014) R: A language and environment for statistical computing. Software published online on the Comprehensive R Archive Network · Zbl 1302.62226
[22] Reulen H (2014) gamboostMSM: Estimating multistate models using gamboost(). R add-on package published online on the Comprehensive R Archive Network, R package version 1.1.87
[23] Rodríguez-Girondo M, Kneib T, Cadarso-Suárez C, Abu-Assi E (2013) Model building in nonproportional hazard regression. Stat Med 32(30):5301-5314 · doi:10.1002/sim.5961
[24] Schmid M, Hothorn T (2008) Boosting additive models using component-wise P-Splines. Comput Stat Data Anal 53(2):298-311 · Zbl 1231.62071 · doi:10.1016/j.csda.2008.09.009
[25] Shao J (1993) Linear model selection by cross-validation. J Am Stat Assoc 88(422):486-494 · Zbl 0773.62051 · doi:10.1080/01621459.1993.10476299
[26] Therneau T (2014) Survival: a package for survival analysis in S. R add-on package published online on the Comprehensive R Archive Network, R package version 2.37-7
[27] Tibshirani R et al (1997) The lasso method for variable selection in the Cox model. Stat Med 16(4):385-395 · doi:10.1002/(SICI)1097-0258(19970228)16:4<385::AID-SIM380>3.0.CO;2-3
[28] Verweij PJM, Van Houwelingen HC (1993) Cross-validation in survival analysis. Stat Med 12(24):2305-2314 · doi:10.1002/sim.4780122407
[29] Wolkewitz M, Vonberg R, Grundmann H, Beyersmann J, Gastmeier P, Barwolff S, Geffers C, Behnke M, Ruden H, Schumacher M (2008) Risk factors for the development of nosocomial pneumonia and mortality on intensive care units: application of competing risks models. Critical Care 12(2):R44 · doi:10.1186/cc6852
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.