×

A cautionary case study of approaches to the treatment of missing data. (English) Zbl 1367.62297

Summary: This article presents findings from a case study of different approaches to the treatment of missing data. Simulations based on data from the Los Angeles Mammography Promotion in Churches Program (LAMP) led the authors to the following cautionary conclusions about the treatment of missing data: (1) Automated selection of the imputation model in the use of full Bayesian multiple imputation can lead to unexpected bias in coefficients of substantive models. (2) Under conditions that occur in actual data, casewise deletion can perform less well than we were led to expect by the existing literature. (3) Relatively unsophisticated imputations, such as mean imputation and conditional mean imputation, performed better than the technical literature led us to expect. (4) To underscore points (1), (2), and (3), the article concludes that imputation models are substantive models, and require the same caution with respect to specificity and calculability.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62D05 Sampling theory, sample surveys

Software:

Stata; DPJ; bootstrap; mvis
Full Text: DOI

References:

[1] Allison PD (2001) Missing data. Sage Publications, Thousand Oaks
[2] Ambler G, Omar RZ (2007) A comparison of imputation techniques for handling missing predictor values in a risk model with a binary outcome. Stat Methods Med Res 16: 277–298 · Zbl 1122.62334 · doi:10.1177/0962280206074466
[3] Anderson AB, Basilevsky A, Hum DPJ (1983) Missing data: a review of the literature. In: Rossi, Wright, Anderson (eds) Handbook of survey research. Academic Press, New York
[4] Breen N, Kessler L (1994) Changes in the use of screening mammography: evidence from the 1987 and 1990 National Health Interview Surveys. Am J Public Health 84: 62–72 · doi:10.2105/AJPH.84.1.62
[5] Brick JM, Kalton G (1996) Handling missing data in survey research. Stat Methods Med Res 5: 215–238 · doi:10.1177/096228029600500302
[6] Carpenter JR, Kenward MG, White IR (2007) Sensitivity analysis after multiple imputation under missing at random: a weighting approach. Stat Methods Med Res 16: 259–275 · Zbl 1122.62300 · doi:10.1177/0962280206075303
[7] Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, New York · Zbl 0835.62038
[8] Farewell VT (1979) Some results on the estimation of logistic models based on retrospective data. Biometrika 66: 533–538 · Zbl 0448.62082 · doi:10.1093/biomet/66.1.27
[9] Fox J (1997) Applied regression analysis, linear models, and related methods. Sage Publications, Thousand Oaks
[10] Fox SA, Siu AL, Stein JA (1994) The importance of physician communication on breast-cancer screening of older women. Arch Intern Med 154: 2058–2068 · doi:10.1001/archinte.154.18.2058
[11] Fox SA, Pitkin K, Paul C, Carson S, Duan N (1998) Breast cancer screening adherence: does church attendance matter?. Health Educ Behav 25: 742–758 · doi:10.1177/109019819802500605
[12] Groves RM, Singer E, Corning A (2000) Leverage–Saliency theory of survey participation. Public Opin Q 64: 299–308 · doi:10.1086/317990
[13] Heckman J (1976) The common structure of statistical models of truncation, sample selection, and limited dependent variables, and a simple estimator for such models. Ann Econ Soc Meas 5: 475–492
[14] Heckman J (1979) Sample selection bias as a specification error. Econometrica 47: 153–161 · Zbl 0392.62093 · doi:10.2307/1912352
[15] Jones MP (1996) Indicator and stratification methods for missing explanatory variables in multiple linear regression. J Am Stat Assoc 91: 222–230 · Zbl 0870.62053 · doi:10.2307/2291399
[16] Landerman LR, Land KC, Pieper CF (1997) An empirical evaluation of the predictive mean matching method for imputing missing values. Sociol Methods Res 26: 3–33 · doi:10.1177/0049124197026001001
[17] Little RJA (1992) Regression with missing X’s: a review. J Am Stat Assoc 87: 1227–1238 · doi:10.2307/2290664
[18] Little RJA, Rubin DB (2002) Statistical analysis with missing data, 2nd edn. Wiley, New York
[19] McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman & Hall, New York · Zbl 0744.62098
[20] Rao JNK, Shao J (1992) Jackknife variance estimation with survey data under hot deck imputation. Biometrika 79: 811–822 · Zbl 0764.62008 · doi:10.1093/biomet/79.4.811
[21] Royston P (2004) Multiple imputation of missing values. Stata J 4: 227–241
[22] Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York
[23] Rubin DB (1996) Multiple imputation after 18+ years. J Am Stat Assoc 91: 473–489 · Zbl 0869.62014 · doi:10.2307/2291635
[24] Rubin DB, Schenker N (1986) Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. J Am Stat Assoc 81: 366–374 · Zbl 0615.62011 · doi:10.2307/2289225
[25] Rubin DB, Schenker N (1991) Multiple imputation in health-care databases: an overview and some applications. Stat Med 10: 585–598 · doi:10.1002/sim.4780100410
[26] Schafer JL (1997a) Analysis of incomplete multivariate data. Chapman & Hall, London · Zbl 0997.62510
[27] Schafer JL (1997b) Software for multiple imputation. [ http://www.stat.psu.edu/\(\sim\)jls/misoftwa.html]
[28] Tanner MA, Wong WH (1987) The calculation of posterior distributions by data augmentation (with discussion). J Am Stat Assoc 82: 528–550 · Zbl 0619.62029 · doi:10.2307/2289457
[29] Vach W (1994) Logistic regression with missing values in the covariates. Springer, New York · Zbl 0801.62061
[30] Xie Y, Manski CF (1989) The logit model and response-based samples. Sociol Methods Res 17: 283–302 · doi:10.1177/0049124189017003003
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.