×

Model selection for the trend vector model. (English) Zbl 1360.62408

Summary: Model selection is an important component of data analysis. This study focuses on issues of model selection for the trend vector model, a model for the analysis of longitudinal multinomial outcomes. The trend vector model is a so-called marginal model, focusing on population averaged evolutions over time. A quasi-likelihood method is employed to obtain parameter estimates. Such an optimization function in theory invalidates likelihood-based statistics, such as the likelihood ratio statistic. Moreover, standard errors obtained from the Hessian are biased. In this paper, the performances of different model selection methods for the trend vector model are studied in detail. We specifically focused on two aspects of model selection: variable selection and dimensionality determination. Based on the quasi-likelihood function, selection criteria analogous to the likelihood ratio statistics, AIC and BIC, were employed. Additionally, Wald and resampling statistics were included as variable selection criteria. A series of simulations were carried out to evaluate the relative performance of these criteria. The results suggest that model selection can be best performed using either the quasi likelihood ratio statistic or the quasi-BIC. A special study on dimensionality selection found that the quasi-AIC also performs well for cases with degrees of freedom greater than 8. Another important finding is that the sandwich estimator for standard errors used in Wald statistics does not perform well. Even for larger sample sizes, the bias-correction procedure for the sandwich estimator is needed to give satisfactory results.

MSC:

62J12 Generalized linear models (logistic models)
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

bootstrap
Full Text: DOI

References:

[1] ADACHI, K. (2000), “Scaling of a Longitudinal Variable with Time-Varying Representation of Individuals”, British Journal of Mathematical and Statistical Psychology, 53, 233-253. · doi:10.1348/000711000159312
[2] AKAIKE, H. (1974), “A New Look at the Statistical Model Identification”, IEEE Transactions on Automatic Control, 19(6), 716-723. · Zbl 0314.62039 · doi:10.1109/TAC.1974.1100705
[3] DE ROOIJ, M. (2009), “Trend Vector Models for the Analysis of Change in Continuous Time for Multiple Groups”, Computational Statistics and Data Analysis, 53, 3209-3216. · Zbl 1453.62080 · doi:10.1016/j.csda.2008.09.030
[4] DE ROOIJ, M., and SCHOUTEDEN, M. (2012), “The Mixed Effects Trend Vector Model”, Multivariate Behavioral Research, 47, 635-664. · doi:10.1080/00273171.2012.692640
[5] EFRON, B., and TIBSHIRANI, R.J. (1993), An Introduction to the Bootstrap, New York: Chapman and Hall. · Zbl 0835.62038
[6] FAY, M.P., and GRAUBARD, B.I. (2001), “Small-Sample Adjustments for Wald-Type Tests Using Sandwich Estimators”, Biometrics, 57, 1198-1206. · Zbl 1210.62133 · doi:10.1111/j.0006-341X.2001.01198.x
[7] HEDEKER, D., and GIBBONS, R.D. (2006), Longitudinal Data Analysis, New York: John Wiley & Sons. · Zbl 1136.62075
[8] KAUERMANN, G., CARROLL, R.J. (2001), “A Note on the Efficiency of Sandwich Covariance Matrix Estimation”, Journal of the American Statistical Association, 96 (456), 1387-1398. · Zbl 1073.62539 · doi:10.1198/016214501753382309
[9] LIANG, K.Y., and ZEGER, S.L. (1986), “Longitudinal Data Analysis Using Generalized Linear Models”, Biometrika, 73, 13-22. · Zbl 0595.62110 · doi:10.1093/biomet/73.1.13
[10] LIPSITZ, S.R., KIM, K., and ZHAO, L. (1994), “Analysis of Repeated Categorical Data Using Generalized Estimating Equations”, Statistics in Medicine, 13, 1149-1163. · doi:10.1002/sim.4780131106
[11] MANCL, L.A., and DEROUEN, T.A. (2001), “A Covariance Estimator for GEE with Improved Small-Sample Properties”, Biometrics, 57, 126-134. · Zbl 1209.62310 · doi:10.1111/j.0006-341X.2001.00126.x
[12] MOLENBERGHS, G., and VERBEKE, G. (2005), Models for Discrete Longitudinal Data, New York: Springer. · Zbl 1093.62002
[13] NEUHAUS, J.M. (1993), “Estimation Efficiency and Tests of Covariate Effects with Clustered Binary Data”, Biometrics, 49, 989-996. · doi:10.2307/2532241
[14] PAN, W. (2001), “Akaike’s Information Criterion in Generalized Estimating Equations”, Biometrics, 57, 120-125. · Zbl 1210.62099 · doi:10.1111/j.0006-341X.2001.00120.x
[15] PAN, W., and LE, C.T. (2001), “Bootstrap Model Selection in Generalized Linear Models”, Journal of Agricultural, Biological, and Environmental Statistics, 6, 49-61. · doi:10.1198/108571101300325139
[16] PAN, W., and WALL, M.W. (2002), “Small-Sample Adjustments in Using the Sandwich Variance Estimator in Generalized Estimating Equations”, Statistics in Medicine, 21, 1429-1441. · doi:10.1002/sim.1142
[17] PREISSER, J.S., and QAQISH, B.F. (1996), “Deletion Diagnostics for Generalized Estimating Equations, <Emphasis Type=”Italic”>Biometrika, 83, 551-562. · Zbl 0866.62041 · doi:10.1093/biomet/83.3.551
[18] SCHWARZ, G. (1978), “Estimating the Dimensions of a Model”, Annals of Statistics, 6, 461-464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[19] SHERMAN, M., and LE CESSIE, S. (1997), “A Comparison Between Bootstrap Methods and Generalized Estimating Equations for Correlated Outcomes in Generalized Linear Models”, Communications in Statistics-Simulation and Computation, 26, 901-925. · Zbl 0901.62088 · doi:10.1080/03610919708813417
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.