×

On the performance of sequential regression multiple imputation methods with non-normal error distributions. (English) Zbl 1160.62064

Summary: Sequential regression multiple imputation has emerged as a popular approach for handling incomplete data with complex features. In this approach, imputations for each missing variable are produced based on a regression model using other variables as predictors in a cyclic manner. The normality assumption is frequently imposed for the error distributions in the conditional regression models for continuous variables, despite that it rarely holds in real scenarios. We use a simulation study to investigate the performance of several sequential regression imputation methods when the error distribution is flat or heavy tailed. The methods evaluated include the sequential normal imputation and its several extensions which adjust for non-normal error terms.
The results show that all methods perform well for estimating the marginal mean and proportion, as well as the regression coefficient when the error distribution is flat or moderately heavy tailed. When the error distribution is strongly heavy tailed, all methods retain their good performances for the mean and the adjusted methods have robust performances for the proportion; but all methods can have poor performances for the regression coefficient because they cannot accommodate the extreme values well. We caution against the mechanical use of sequential regression imputation without model checking and diagnostics.

MSC:

62J05 Linear regression; mixed models
62F99 Parametric inference
65C60 Computational problems in statistics (MSC2010)
62G32 Statistics of extreme values; tail inference
62L99 Sequential statistical methods
Full Text: DOI

References:

[1] DOI: 10.1111/j.1467-9876.2007.00613.x · Zbl 1273.62257 · doi:10.1111/j.1467-9876.2007.00613.x
[2] DOI: 10.1177/0962280206074466 · Zbl 1122.62334 · doi:10.1177/0962280206074466
[3] DOI: 10.1111/j.1467-9574.2004.00271.x · Zbl 1066.65020 · doi:10.1111/j.1467-9574.2004.00271.x
[4] DOI: 10.1002/sim.2117 · doi:10.1002/sim.2117
[5] Demirtas H., Amer. Statistician 60 pp 348– (2006)
[6] DOI: 10.1002/sim.2560 · doi:10.1002/sim.2560
[7] DOI: 10.1080/10629360600903866 · Zbl 1133.62337 · doi:10.1080/10629360600903866
[8] DOI: 10.1111/j.0006-341X.2005.031010.x · Zbl 1077.62091 · doi:10.1111/j.0006-341X.2005.031010.x
[9] DOI: 10.1198/000313006X126819 · doi:10.1198/000313006X126819
[10] Hoaglin D. C., Exploring Data Tables, Trends, and Shapes pp 461– (1985) · Zbl 0659.62002
[11] DOI: 10.1198/000313007X172556 · doi:10.1198/000313007X172556
[12] Little R. J. A., Statistical Analysis of Missing Data (2002) · Zbl 1011.62004
[13] Little R. J. A., Statistica Sinica 14 pp 949– (2004)
[14] DOI: 10.1080/01621459.1996.10476991 · doi:10.1080/01621459.1996.10476991
[15] DOI: 10.1080/03610928408828687 · Zbl 1247.62041 · doi:10.1080/03610928408828687
[16] Meng X. L., Statist. Sci. 9 pp 538– (1994)
[17] Meng X. L., Survey Nonresponse pp 343– (2002)
[18] Raghunathan T. E., Surv. Methodol. 27 pp 85– (2001)
[19] DOI: 10.1093/biomet/87.1.113 · Zbl 0974.62016 · doi:10.1093/biomet/87.1.113
[20] Rousseeuw P. J., Robust Regression and Outlier Detection (2003) · Zbl 0711.62030
[21] DOI: 10.1080/01621459.1996.10476908 · doi:10.1080/01621459.1996.10476908
[22] Rubin D. B., Multiple Imputation for Nonresponse in Surveys (2004) · Zbl 1070.62007
[23] DOI: 10.1080/01621459.1986.10478280 · doi:10.1080/01621459.1986.10478280
[24] DOI: 10.1201/9781439821862 · doi:10.1201/9781439821862
[25] DOI: 10.1016/0167-9473(95)00057-7 · Zbl 0875.62095 · doi:10.1016/0167-9473(95)00057-7
[26] DOI: 10.1198/016214505000001375 · Zbl 1120.62348 · doi:10.1198/016214505000001375
[27] Tukey , J. W. ( 1977 ). Modern techniques in data analysis, NSF-sponsored regional research conference at Southeastern Massachusetts University , North Dartmouth , MA .
[28] DOI: 10.1177/0962280206074463 · Zbl 1122.62382 · doi:10.1177/0962280206074463
[29] DOI: 10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R · doi:10.1002/(SICI)1097-0258(19990330)18:6<681::AID-SIM71>3.0.CO;2-R
[30] DOI: 10.1080/10629360600810434 · Zbl 1144.62332 · doi:10.1080/10629360600810434
[31] DOI: 10.1177/0962280206074464 · Zbl 1122.62380 · doi:10.1177/0962280206074464
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.