×

Sequential imputation for models with latent variables assuming latent ignorability. (English) Zbl 1521.62132

Summary: Models that involve an outcome variable, covariates, and latent variables are frequently the target for estimation and inference. The presence of missing covariate or outcome data presents a challenge, particularly when missingness depends on the latent variables. This missingness mechanism is called latent ignorable or latent missing at random and is a generalisation of missing at random. Several authors have previously proposed approaches for handling latent ignorable missingness, but these methods rely on prior specification of the joint distribution for the complete data. In practice, specifying the joint distribution can be difficult and/or restrictive. We develop a novel sequential imputation procedure for imputing covariate and outcome data for models with latent variables under latent ignorable missingness. The proposed method does not require a joint model; rather, we use results under a joint model to inform imputation with less restrictive modelling assumptions. We discuss identifiability and convergence-related issues, and simulation results are presented in several modelling settings. The method is motivated and illustrated by a study of head and neck cancer recurrence. Imputing missing data for models with latent variables under latent-dependent missingness without specifying a full joint model.

MSC:

62L12 Sequential estimation
62J05 Linear regression; mixed models
62P10 Applications of statistics to biology and medical sciences; meta analysis
62-08 Computational methods for problems pertaining to statistics

References:

[1] Bartlett, J.W., Seaman, S.R., White, I.R. & Carpenter, J.R. (2014). Multiple imputation of covariates by fully conditional specification: accomodating the substantive model. Statistical Methods in Medical Research24, 462-487.
[2] Beesley, L.J., Bartlett, J.W., Wolf, G.T. & Taylor, J.M.G. (2016). Multiple imputation of missing covariates for the Cox proportional hazards cure model. Statistics in Medicine35, 4701-4717.
[3] Chung, H., Flaherty, B.P. & Schafer, J.L. (2006). Latent class logistic regression: application to marijuana use and attitudes among high school seniors. Journal of the Royal Statistical Society169, 723-743.
[4] Duffy, S., Taylor, J.M.G., Terrell, J., (2008). IL‐6 predicts recurrence among head and neck cancer patients. Cancer113, 750-757.
[5] Follmann, D. & Wu, M.C. (1995). An approximate generalized linear model with random effects for informative missing data. Biometrics51, 151-168. · Zbl 0825.62607
[6] Frangakis, C.E. & Rubin, D.B. (1999). Addressing complications of intention‐to‐treat analysis in the combined presence of all‐or‐none treatment‐noncompliance and subsequent missing outcomes. Biometrika86, 365-379. · Zbl 0934.62110
[7] Gelman, A. (2004). Parameterization and bayesian modeling. Journal of the American Statistical Association99, 537-545. · Zbl 1117.62343
[8] Gelman, A. & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science7, 457-511. · Zbl 1386.65060
[9] Giusti, C. & Little, R.J.A. (2011). An analysis of nonignorable nonresponse to income in a survey with a rotating panel design. Journal of Official Statistics27, 211-229.
[10] Harel, O. (2003). Strategies for data analysis with two types of missing values. Ph.D. thesis, Pennsylvania State University.
[11] Harel, O. & Schafer, J.L. (2009). Partial and latent ignorability in missing‐data problems. Biometrika96, 37-50. · Zbl 1162.62095
[12] Hughes, R.A., White, I.R., Seaman, S.R., Carpenter, J.R., Tilling, K. & Sterne, J.A.C. (2014). Joint modeling rationale for chained equations. BMC Medical Research Methodology14, 1-10.
[13] Jung, H. (2007). A latent‐class selection model for nonignorable missing data. Ph.D. thesis, Pennsylvania State University.
[14] Little, R.J.A. (1995). Modeling the drop‐out mechanism in repeated‐measures studies. Journal of the American Statistical Association90, 1112-1121. · Zbl 0841.62099
[15] Little, R.J. (2009a). Comments on: Missing data methods in longitudinal studies: a review. Test18, 47-50. · Zbl 1203.62196
[16] Little, R.J. (2009b). Selection and pattern‐mixture models. In Longitudinal Data Analysis, eds. G.Fitzmaurice (ed.), M.Davidian (ed.), G.Verbeke (ed.) & G.Molenberghs (ed.), chap. 18, pp. 409-431New York, NY: Taylor & Francis Group.
[17] Little, R.J.A. & Rubin, D.B. (2002). Statistical Analysis with Missing Data, 2nd edn. Hoboken, NJ: John Wiley and Sons, Inc. · Zbl 1011.62004
[18] Liu, J., Gelman, A., Hill, J., Su, Y.S. & Kropko, J. (2013). On the stationary distribution of iterative imputation. Biometrika101, 155-173. · Zbl 1285.62058
[19] Lu, Z.L., Zhang, Z. & Lubke, G. (2011). Bayesian inference for growth mixture models with latent class dependent missing data. Multivariate Behavioral Research46, 567-597.
[20] McCulloch, C.E., Neuhaus, J.M. & Olin, R.L. (2016). Biased and unbiased estimation in longitudinal studies with informative visit processes. Biometrics72, 1315-1324. · Zbl 1390.62293
[21] Meng, X.L. (1994). Multiple‐imputation inferences with uncongenial sources of input. Statistical Science9, 538-573.
[22] Miao, W., Ding, P. & Geng, Z. (2016). Identifiability of normal and normal mixture models with nonignorable missing data. Journal of the American Statistical Association111, 1673-1683.
[23] Molenberghs, G., Beunckens, C. & Sotto, C. (2008). Every missing not at random model has got a missing at random counterpart with equal fit. Journal of the Royal Statistical Society (Series B)70, 371-388. · Zbl 1148.62046
[24] Peterson, L.A., Bellile, E.L., Wolf, G.T., Virani, S., Shuman, A.G. & Taylor, J.M.G. (2016). Cigarette use, comorbidities, and prognosis in a prospective head and neck squamous cell carcinoma population. Head and Neck38, 1810-1820.
[25] Raghunathan, T.E. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology27, 85-95.
[26] Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys, 1st edn. New York, NY: John Wiley and Sons, Inc. · Zbl 1070.62007
[27] Schafer, J.L. (1997). Imputation of missing covariates under a multivariate linear mixed model. Technical report, Pennsylvania State University.
[28] Schafer, J.L. & Yucel, R.M. (2002). Computational strategies for multivariate linear mixed‐effects models with missing values. Journal of Computational and Graphical Statistics11, 437-457.
[29] Sy, J.P. & Taylor, J.M.G. (2000). Estimation in a Cox proportional hazards cure model. Biometrics56, 227-236. · Zbl 1060.62670
[30] Taylor, J.M.G. (1995). Semiparametric estimation in failure time mixture models. Biometrics51, 899-907. · Zbl 0875.62493
[31] Van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research16, 219-242. · Zbl 1122.62382
[32] Van Buuren, S., Brand, J.P.L., Groothuis‐Oudshoorn, C.G.M. & Rubin, D.B. (2006). Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation76, 1049-1064. · Zbl 1144.62332
[33] Wang, S., Shao, J. & Kwang KimJ. (2014). An instrumental variable approach for identification and estimation with nonignorable nonresponse. Statistica Sinica24, 1097-1116. · Zbl 1534.62039
[34] White, I.R. & Royston, P. (2009). Imputing missing covariate values for the Cox model. Statistics in Medicine28, 1982-1998.
[35] Wu, M.C. & Carroll, R.J. (1988). Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process. Biometrics44, 175-188. · Zbl 0707.62210
[36] Yang, X., Lu, J. & Shoptaw, S. (2008). Imputation‐based strategies for clinical trial longitudinal data with nonignorable missing values. Statistics in Medicine27, 2826-2849.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.