×

Multiple imputation methods for handling incomplete longitudinal and clustered data where the target analysis is a linear mixed effects model. (English) Zbl 1436.62572

Summary: Multiple imputation (MI) is increasingly popular for handling multivariate missing data. Two general approaches are available in standard computer packages: MI based on the posterior distribution of incomplete variables under a multivariate (joint) model, and fully conditional specification (FCS), which imputes missing values using univariate conditional distributions for each incomplete variable given all the others, cycling iteratively through the univariate imputation models. In the context of longitudinal or clustered data, it is not clear whether these approaches result in consistent estimates of regression coefficient and variance component parameters when the analysis model of interest is a linear mixed effects model (LMM) that includes both random intercepts and slopes with either covariates or both covariates and outcome contain missing information. In the current paper, we compared the performance of seven different MI methods for handling missing values in longitudinal and clustered data in the context of fitting LMMs with both random intercepts and slopes. We study the theoretical compatibility between specific imputation models fitted under each of these approaches and the LMM, and also conduct simulation studies in both the longitudinal and clustered data settings. Simulations were motivated by analyses of the association between body mass index (BMI) and quality of life (QoL) in the Longitudinal Study of Australian Children (LSAC). Our findings showed that the relative performance of MI methods vary according to whether the incomplete covariate has fixed or random effects and whether there is missingnesss in the outcome variable. We showed that compatible imputation and analysis models resulted in consistent estimation of both regression parameters and variance components via simulation. We illustrate our findings with the analysis of LSAC data.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

MICE

References:

[1] Asparouhov, T., & Muthén, B. (2010). Multiple imputation with mplus. MPlus Web Notes.
[2] Audigier, V., White, I. R., Jolani, S., Debray, T., Quartagno, M., Carpenter, J., … Resche‐Rigon, M. (2018). Multiple imputation for multilevel data with continuous and binary variables. Statistical Science, 33(2), 160-183. · Zbl 1397.62265
[3] Enders, C. K., Keller, B. T., & Levy, R. (2017). A fully conditional specification approach to multilevel imputation of categorical and continuous variables. Psychological Methods, 23(2), 298-317.
[4] Enders, C. K., Mistler, S. A., & Keller, B. T. (2016). Multilevel multiple imputation: A review and evaluation of joint modeling and chained equations imputation. Psychological Methods, 21(2), 222-240.
[5] Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. Boca Raton, FL: Chapman & Hall/CRC Texts in Statistical Science. Taylor & Francis, ISBN: 9781439840955. https://books.google.com.au/books?id=ZXL6AQAAQBAJ
[6] Goldstein, H., Carpenter, J. R., & Browne, W. J. (2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non‐linear terms. Journal of the Royal Statistical Society: Series A (Statistics in Society), 177(2), 553-564.
[7] Goldstein, H., Carpenter, J., Kenward, M. G., & Levin, K. A. (2009). Multilevel models with multivariate mixed response types. Statistical Modelling, 9(3), 173-197. · Zbl 07257700
[8] Grund, S., Lüdtke, O., & Robitzsch, A. (2016). Multiple imputation of missing covariate values in multilevel models with random slopes: A cautionary note. Behavior Research Methods, 48(2), 640-649.
[9] Hughes, R. A., White, I. R., Seaman, S. R., Carpenter, J. R., Tilling, K., & Sterne, J. A. C. (2014). Joint modelling rationale for chained equations. BMC Medical Research Methodology, 14(1), 28.
[10] Huque, M. H., Carlin, J. B., Simpson, J. A., & Lee, K. J. (2018). A comparison of multiple imputation methods for missing data in longitudinal studies. BMC Medical Research Methodology, 18(1), 168.
[11] Laird, N. M. (1988). Missing data in longitudinal studies. Statistics in Medicine, 7(1‐2), 305-315.
[12] Leacy, F. P. (2016). Multiple imputation under missing not at random assumptions via fully conditional specification (PhD Thesis). University of Cambridge, Cambridge, UK: MRC Biostatistical unit.
[13] Liu, J., Gelman, A., Hill, J., Su, Y.‐S., & Kropko, J. (2014). On the stationary distribution of iterative imputations. Biometrika, 101(1), 155-173. · Zbl 1285.62058
[14] LSAC. (2011). Growing up in Australia: The longitudinal study of Australian children (Technical Report). Canberra, Australia: Australian Government Department of Families, Housing, Community Services and Indigenous Affairs (FaHCSIA).
[15] Meng, X.‐L. (1994). Multiple‐imputation inferences with uncongenial sources of input. Statistical Science, 9(4), 538-558.
[16] Moreno‐Betancur, M., Leacy, F. P., Tompsett, D., & White, I. (2017). mice: The NARFCS procedure for sensitivity analyses. Retrieved from https://github.com/moreno‐betancur/NARFCS
[17] Quartagno, M., & Carpenter, J. R. (2016). Multiple imputation for IPD meta‐analysis: Allowing for heterogeneity and studies with missing covariates. Statistics in Medicine, 35(17), 2938-2954.
[18] Quartagno, M., & Carpenter, J. R. (2019). Multiple imputation for discrete data: Evaluation of the joint latent normal model. Biometrical Journal, 61(4), 1003-1019. · Zbl 1429.62595
[19] Raghunathan, T. E., Lepkowski, J. M., Van Hoewyk, J., & Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology, 27(1), 85-96.
[20] Reiter, J. P., Raghunathan, T. E., & Kinney, S. K. (2006). The importance of modeling the sampling design in multiple imputation for missing data. Survey Methodology, 32(2), 143-150.
[21] Resche‐Rigon, M., & White, I. R. (2016). Multiple imputation by chained equations for systematically and sporadically missing multilevel data. Statistical Methods in Medical Research, 27(6), 1634-1649.
[22] Rezvan, P. H., Lee, K. J., & Simpson, J. A. (2015). The rise of multiple imputation: A review of the reporting and implementation of the method in medical research. BMC Medical Research Methodology, 15(1), 30.
[23] Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley. · Zbl 1070.62007
[24] Schafer, J. L. (1997). Analysis of incomplete multivariate data. Boca Raton, FL: CRC press. · Zbl 0997.62510
[25] Schafer, J. L., & Yucel, R. M. (2002). Computational strategies for multivariate linear mixed‐effects models with missing values. Journal of Computational and Graphical Statistics, 11(2), 437-457.
[26] Seaman, S. R., & Hughes, R. A. (2018). Relative efficiency of joint‐model and full‐conditional‐specification multiple imputation when conditional models are compatible: The general location model. Statistical Methods in Medical Research, 27(6), 1603-1614.
[27] Sterne, J. A. C., White, I. R., Carlin, J. B., Spratt, M., Royston, P., & Kenward, M. G. (2009). Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ, 338, b2393.
[28] Tompsett, D. M., Leacy, F., Moreno‐Betancur, M., Heron, J., & White, I. R. (2018). On the use of the not‐at‐random fully conditional specification (NARFCS) procedure in practice. Statistics in Medicine, 37(15), 2338-2353.
[29] vanBuuren, S. (2011). Multiple imputation of multilevel data. In Joop J.Hox (ed.) & J. K.Roberts (ed.) (Eds.), Handbook of advanced multilevel analysis (pp. 173-196). New York: Routledge Academic.
[30] vanBuuren, S., Brand, J. P. L., Groothuis‐Oudshoorn, C. G. M., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76(12), 1049-1064. · Zbl 1144.62332
[31] vanBuuren, S., & Groothuis‐Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1-67.
[32] Yucel, R. M. (2011). Random covariances and mixed‐effects models for imputing multivariate multilevel continuous data. Statistical Modelling, 11(4), 351-370. · Zbl 1420.62279
[33] Zhao, E., & Yucel, R. M. (2009). Performance of sequential imputation method in multilevel applications. American Statistical Association Proceedings of the Survey Research Methods Section, pp. 2800-2810.
[34] Zhu, J., & Raghunathan, T. E. (2015). Convergence properties of a sequential regression multiple imputation algorithm. Journal of the American Statistical Association, 110(511), 1112-1124. · Zbl 1373.62393
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.