×

Regression trees for longitudinal and multiresponse data. (English) Zbl 1454.62198

Summary: Previous algorithms for constructing regression tree models for longitudinal and multiresponse data have mostly followed the CART approach. Consequently, they inherit the same selection biases and computational difficulties as CART. We propose an alternative, based on the GUIDE approach, that treats each longitudinal data series as a curve and uses chi-squared tests of the residual curve patterns to select a variable to split each node of the tree. Besides being unbiased, the method is applicable to data with fixed and random time points and with missing values in the response or predictor variables. Simulation results comparing its mean squared prediction error with that of MVPART are given, as well as examples comparing it with standard linear mixed effects and generalized estimating equation models. Conditions for asymptotic consistency of regression tree function estimates are also given.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62G08 Nonparametric regression and quantile regression
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

UCI-ml; lme4; geepack; mvpart

References:

[1] Abdolell, M., LeBlanc, M., Stephens, D. and Harrison, R. V. (2002). Binary partitioning for continuous longitudinal data: Categorizing a prognostic variable. Stat. Med. 21 3395-3409.
[2] Alexander, C. S. and Markowitz, R. (1986). Maternal employment and use of pediatric clinic services. Med. Care 24 134-147.
[3] Asuncion, A. and Newman, D. J. (2007). UCI Machine Learning Repository. Available at .
[4] Bates, D. (2011). Linear mixed-effects models using S4 classes. R package version 0.999375-42.
[5] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees . Wadsworth, Belmont, CA. · Zbl 0541.62042
[6] Chaudhuri, P. and Loh, W.-Y. (2002). Nonparametric estimation of conditional quantiles using quantile regression trees. Bernoulli 8 561-576. · Zbl 1009.62031
[7] Chaudhuri, P., Huang, M. C., Loh, W. Y. and Yao, R. (1994). Piecewise-polynomial regression trees. Statist. Sinica 4 143-167. · Zbl 0824.62032
[8] Chaudhuri, P., Lo, W. D., Loh, W.-Y. and Yang, C. C. (1995). Generalized regression trees. Statist. Sinica 5 641-666. · Zbl 0824.62060
[9] Cleveland, W. S. (1979). Robust locally weighted regression and smoothing scatterplots. J. Amer. Statist. Assoc. 74 829-836. · Zbl 0423.62029 · doi:10.2307/2286407
[10] De’ath, G. (2002). Multivariate regression trees: A new technique for modeling species-environment relationships. Ecology 83 1105-1117.
[11] De’ath, G. (2012). MVPART: Multivariate partitioning. R package version 1.6-0.
[12] Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data , 2nd ed. Oxford Statistical Science Series 25 . Oxford Univ. Press, Oxford. · Zbl 1031.62002
[13] Fitzmaurice, G. M., Laird, N. M. and Ware, J. H. (2004). Applied Longitudinal Analysis . Wiley, Hoboken, NJ. · Zbl 1057.62052
[14] Härdle, W. (1990). Applied Nonparametric Regression. Econometric Society Monographs 19 . Cambridge Univ. Press, Cambridge. · Zbl 0714.62030
[15] Hothorn, T., Hornik, K. and Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. J. Comput. Graph. Statist. 15 651-674. · doi:10.1198/106186006X133933
[16] Hsiao, W.-C. and Shih, Y.-S. (2007). Splitting variable selection for multivariate regression trees. Statist. Probab. Lett. 77 265-271. · Zbl 1106.62075 · doi:10.1016/j.spl.2006.08.014
[17] Kim, H., Loh, W. Y., Shih, Y. S. and Chaudhuri, P. (2007). Visualizable and interpretable regression models with good prediction power. IIE Transactions 39 565-579.
[18] Larsen, D. R. and Speckman, P. L. (2004). Multivariate regression trees for analysis of abundance data. Biometrics 60 543-549. · Zbl 1274.62807 · doi:10.1111/j.0006-341X.2004.00202.x
[19] Lee, S. K. (2005). On generalized multivariate decision tree by using GEE. Comput. Statist. Data Anal. 49 1105-1119. · Zbl 1429.62565
[20] Loh, W.-Y. (2002). Regression trees with unbiased variable selection and interaction detection. Statist. Sinica 12 361-386. · Zbl 0998.62042
[21] Loh, W.-Y. (2009). Improving the precision of classification trees. Ann. Appl. Stat. 3 1710-1737. · Zbl 1184.62109 · doi:10.1214/09-AOAS260
[22] Loh, W.-Y. and Shih, Y.-S. (1997). Split selection methods for classification trees. Statist. Sinica 7 815-840. · Zbl 1067.62545
[23] Segal, M. R. (1992). Tree structured methods for longitudinal data. J. Amer. Statist. Assoc. 87 407-418.
[24] Shih, Y. S. (2004). A note on split selection bias in classification trees. Comput. Statist. Data Anal. 45 457-466. · Zbl 1429.62264
[25] Singer, J. D. and Willett, J. B. (2003). Applied Longitudinal Data Analysis . Oxford Univ. Press, New York.
[26] Strobl, C., Boulesteix, A.-L. and Augustin, T. (2007). Unbiased split selection for classification trees based on the Gini index. Comput. Statist. Data Anal. 52 483-501. · Zbl 1452.62469
[27] Yan, J., Højsgaard and Halekoh, U. (2012). Generalized estimation equation solver. R package version 1.1-6.
[28] Yeh, I. C. (2007). Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cement and Concrete Composites 29 474-480.
[29] Yu, Y. and Lambert, D. (1999). Fitting trees to functional data, with an application to time-of-day patterns. J. Comput. Graph. Statist. 8 749-762.
[30] Zhang, H. (1998). Classification trees for multiple binary responses. J. Amer. Statist. Assoc. 93 180-193. · Zbl 0906.62130 · doi:10.2307/2669615
[31] Zhang, H. and Ye, Y. (2008). A tree-based method for modeling a multivariate ordinal response. Stat. Interface 1 169-178. · Zbl 1230.62088 · doi:10.4310/SII.2008.v1.n1.a14
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.