×

Multiple imputation for discrete data: evaluation of the joint latent normal model. (English) Zbl 1429.62595

Summary: Missing data are ubiquitous in clinical and social research, and multiple imputation (MI) is increasingly the methodology of choice for practitioners. Two principal strategies for imputation have been proposed in the literature: joint modelling multiple imputation (JM-MI) and full conditional specification multiple imputation (FCS-MI). While JM-MI is arguably a preferable approach, because it involves specification of an explicit imputation model, FCS-MI is pragmatically appealing, because of its flexibility in handling different types of variables. JM-MI has developed from the multivariate normal model, and latent normal variables have been proposed as a natural way to extend this model to handle categorical variables. In this article, we evaluate the latent normal model through an extensive simulation study and an application on data from the German Breast Cancer Study Group, comparing the results with FCS-MI. We divide our investigation in four sections, focusing on (i) binary, (ii) categorical, (iii) ordinal, and (iv) count data. Using data simulated from both the latent normal model and the general location model, we find that in all but one extreme general location model setting JM-MI works very well, and sometimes outperforms FCS-MI. We conclude the latent normal model, implemented in the R package jomo, can be used with confidence by researchers, both for single and multilevel multiple imputation.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis

Software:

R; REALCOM; jomo; countimp; MICE

References:

[1] Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669-679. · Zbl 0774.62031
[2] Audiger, V., White, I. R., Debray, T., Jolani, S., Quartagno, M., Carpenter, J., … Resche‐Rigon, M. (2018). Statistical Science, 33(2), 160-183. · Zbl 1397.62265
[3] Bartlett, J. W., Seaman, S., White, I. R., & Carpenter, J. R. (2015). Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model. Statistical Methods in Medical Research, 24, 462-487.
[4] Browne, W. J. (2006). {MCMC} algorithms for constrained variance matrices. Computational Statistics and Data Analysis, 50(7), 1655-1677. · Zbl 1445.62048
[5] van Buuren, S., Brand, J. P. L., Groothuis‐Oudshoorn, K. C., & Rubin, D. B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76(12), 1049-1064. · Zbl 1144.62332
[6] Carpenter, J., & Kenward, M. (2013). Multiple imputation and its application. Hoboken, New Jersey: Wiley. ISBN: 978‐0‐470‐74052‐1. · Zbl 1352.62008
[7] Carpenter, J. R., Goldstein, H., & Kenward, M. G. (2011). REALCOM‐IMPUames E software for multilevel multiple imputation with mixed response types. Journal of Statistical Software, 45(5), 1-14.
[8] Goldstein, H., Carpenter, J., & Browne, W. (2014). Fitting multilevel multivariate models with missing data in responses and covariates that may include interactions and non‐linear terms. Journal of the Royal Statistical Society: Series A, 177(2), 553-564.
[9] Goldstein, H., Carpenter, J., Kenward, M., & Levin, K. (2009). Multilevel models with multivariate mixed response types. Statistical Modelling, 9(3), 173-197. · Zbl 07257700
[10] Goldstein, H., & Kounali, D. (2009). Multilevel multivariate modelling of childhood growth, numbers of growth measurements and adult characteristics. Journal of the Royal Statistical Society: Series A (Statistics in Society), 172(3), 599-613.
[11] Horton, N. J., Lipsitz, S. R., & Parzen, M. (2003). A potential for bias when rounding in multiple imputation. The American Statistician, 57(4), 229-232. · Zbl 1182.62002
[12] Hughes, R. A., White, I. R., Seaman, S. R., Carpenter, J. R., Tilling, K., & Sterne, J. A. (2014). Joint modelling rationale for chained equations. BMC Medical Research Methodology, 14, 28.
[13] Kalaycioglu, O., Copas, A., King, M., & Omar, R. Z. (2016). A comparison of multiple‐imputation methods for handling missing data in repeated measurements observational studies. Journal of the Royal Statistical Society: Series A (Statistics in Society), 179(3), 683-706.
[14] Keogh, R. H., & Morris, T. P. (2018). Multiple imputation in Cox regression when there are time‐varying effects of covariates. Statistics in Medicine, 37(25), 3661-3678.
[15] Kleinke, K., & Reinecke, J. (2013). countimp: Multiple imputation of incomplete count data. R Foundation for Statistical Computing. R Package, version 1.0.
[16] Lee, K. J., & Carlin, J. B. (2017). Multiple imputation in the presence of non‐normal data. Statistics in Medicine, 36(4), 606-617.
[17] Lee, M. C., & Mitra, R. (2016). Multiply imputing missing values in data sets with mixed measurement scales using a sequence of generalised linear models. Computational Statistics and Data Analysis, 95(C), 24-38. · Zbl 1468.62113
[18] Liu, J., Gelman, A., Hill, J., Su, Y.‐S., & Kropko, J. (2013). On the stationary distribution of iterative imputations. Biometrika, 101(1), 155-173. · Zbl 1285.62058
[19] Muthen, L., & Muthén, B. (2017). Mplus Version 8 User’s Guide. Los Angeles: Muthen & Muthen.
[20] Quartagno, M., & Carpenter, J. (2014). jomo: A package for multilevel joint modelling multiple imputation.
[21] Quartagno, M., Grund, S., & Carpenter, J. (2018). jomo: A flexible package for two‐level joint modelling multiple imputationrevision under review for the R journal.
[22] Sauerbrei, W., Royston, P., Bojar, H., Schmoor, C., & Schumacher, M. (1999). Modelling the effects of standard prognostic factors in node‐positive breast cancer. German Breast Cancer Study Group (GBSG). British Journal of Cancer79(11-12), 1752-1760.
[23] Schafer, J. (1997). Analysis of incomplete multivariate data. London: Chapman and Hall. · Zbl 0997.62510
[24] Schumacher, M., Bastert, G., Bojar, H., Hubner, K., Olschewski, M., Sauerbrei, W., … Rauschecker, H. F. (1994). Randomized 2 x 2 trial evaluating hormonal treatment and the duration of chemotherapy in node‐positive breast cancer patients. German Breast Cancer Study Group. Journal of Clinical Oncology12(10), 2086-2093.
[25] Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association82(398), 528-540. · Zbl 0619.62029
[26] van Buuren, S., & Groothuis‐Oudshoorn, K. (2011). Mice: Multivariate imputation by chained equations in R. Journal of Statistical Software, 45(3), 1-67.
[27] White, I. R., & Royston, P. (2009). Imputing missing covariate values for the Cox model. Statistics in Medicine, 28(15), 1982-1998.
[28] Wu, W., Jia, F., & Enders, C. (2015). A comparison of imputation strategies for ordinal missing data on likert scale variables. Multivariate Behavioral Research, 50(5), 484-503.
[29] Zhang, X., Boscardin, W. J., & Belin, T. R. (2008). Bayesian analysis of multivariate nominal measures using multivariate multinomial probit models. Computational Statistics and Data Analysis, 52(7), 3697-3708. · Zbl 1452.62233
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.