×

BAGEL: a non-ignorable missing value estimation method for mixed attribute datasets. (English) Zbl 1365.62415

Summary: Surveys are mainly conducted to obtain valuable information on some criteria from a specified population. But, the survey results often become biased due to non-response of the subjects under study for highly significant attributes. Such non-ignorable missingness need to be treated and the actual values should be retrieved. Many methods have already been proposed for handling missing values in either discrete or continuous attributes. But, there exists a large gap in handling non-ignorable missing values in datasets with mixed attributes. With the intent of addressing this gap, this paper proposes a methodology called as Bayesian Genetic Algorithm (BAGEL) with hybridized Bayesian and Genetic Algorithm principles. In BAGEL, the initial population is generated using Bayesian model and fitness values of the chromosomes are evaluated using Bayesian principles. BAGEL is implemented in real datasets for imputing both discrete and continuous missing values and the imputation accuracy is observed. The experimental results show the superior performance of BAGEL than other standard imputation techniques. Statistical tests conducted to validate the experimental results also prove that BAGEL outperforms at all missing rates from 5% to 50%.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference
92D20 Protein sequences, DNA sequences

Software:

Mplus

References:

[1] Belin T R 2009 Missing data: What a little can do, and what researchers can do in response. Am. J. Ophthal. 148(6): 820-822 · doi:10.1016/j.ajo.2009.07.027
[2] Zhang Z and Wang L 2012 A note on the robustness of a full Bayesian method for non-ignorable missing data analysis. Brazil. J. Prob. Stat. 26(3): 244-264 · Zbl 1239.62021 · doi:10.1214/10-BJPS132
[3] Wang S, Jiao H and Xiang Y 2013 The effect of nonignorable missing data in computerized adaptive test on item fit statistics for polytomous item response models. Annual meeting of the National Council on Measurement in Education. April 27-30, 2013, San Francisco, CA · Zbl 1285.62029
[4] Pfeffermann D and Sikov N 2011 Imputation and estimation under nonignorable nonresponse in household surveys with missing covariate information. J. Offic. Stat. 27(2): 181-209
[5] Molenberghs G and Kenward M G 2007 Missing data in clinical studies. West Sussex, England: John Wiley · doi:10.1002/9780470510445
[6] Molenberghs G 2009 Incomplete data in clinical studies: Analysis, sensitivity and sensitivity analysis. Drug Inform. J. 43(4): 409-429
[7] Pfeffermann D 2011 Modelling of complex survey data: Why model? Why is it a problem? How can we approach it?. Surv. Method 37(2): 115-136
[8] Xie H 2010 Adjusting for nonignorable missingness when estimating generalized additive models. Biomet. J. 52(2): 186-200 · Zbl 1207.62152 · doi:10.1002/bimj.200900202
[9] Enders C K, Fairchild A J and MacKinnon D P 2013 A Bayesian approach for estimating mediation effects with missing data. Multivar. Behav. Res. 48(3): 340-369 · doi:10.1080/00273171.2013.784862
[10] Muthen B, Asparouhov T, Hunter A and Leuchter A 2011 Growth modeling with non-ignorable dropout: Alternative analyses of the STAR*D antidepressant trial. Psychol. Methods16(1): 16-33 · doi:10.1037/a0022634
[11] Feldman B J and Rabe-Hesketh S R 2012 Modeling achievement trajectories when attrition is informative. J. Educ. Behav. Stat. 37(6): 703-736 · doi:10.3102/1076998612458701
[12] Song W, Yao W and Xing Y 2014 Robust mixture regression model fitting by Laplace distribution. Comput. Stat. Data Anal. 71: 128-137 · Zbl 1471.62189 · doi:10.1016/j.csda.2013.06.022
[13] Kang S, Little R J and Kaciroti N 2015 Missing not at random models for masked clinical trials with dropouts. Clin. Trials 12(2):139-148 · doi:10.1177/1740774514566662
[14] Riddles M K 2013 Propensity score adjusted method for missing data. PhD thesis, Iowa State University
[15] Jiang D, Zhao P and Tang N 2016 A propensity score adjustment method for regression models with nonignorable missing covariates. Comput. Stat. Data Anal. 94: 98-119 · Zbl 1468.62091 · doi:10.1016/j.csda.2015.07.017
[16] Fang F, Hong Q and Shao J 2010 Empirical likelihood estimation for samples with nonignorable nonresponse. Stat. Sinica 20: 263-280 · Zbl 1180.62017
[17] Zhao H, Zhao P Y and Tang N S 2013 Empirical likelihood inference for mean functionals with nonignorably missing response data. Comput. Stat. Data Anal. 66(10): 101-116 · Zbl 1471.62233 · doi:10.1016/j.csda.2013.03.023
[18] Niu C, Guo X, Xu W and Zhu L 2014 Empirical likelihood inference in linear regression with non-ignorable missing response. Comput. Stat. Data Anal. 79: 91-112 · Zbl 1506.62138 · doi:10.1016/j.csda.2014.05.005
[19] Tang N S, Zhao P Y and Zhu HT 2014 Empirical likelihood for estimating equations with nonignorably missing data. Stat. Sinica 24: 723-747 · Zbl 1285.62035
[20] Varin C, Reid N and Firth D 2011 An overview of composite likelihood methods. Stat. Sinica 21: 5-42 · Zbl 1534.62022
[21] Kim J K and Yu C L 2011 A semiparametric estimation of mean functionals with nonignorable missing data. J. Am. Statist. Assoc. 106(493): 157-165 · Zbl 1396.62032 · doi:10.1198/jasa.2011.tm10104
[22] Wang S, Shao J and Kim J K 2014 An instrument variable approach for identification and estimation with nonignorable nonresponse. Stat. Sinica 24: 1097-1116 · Zbl 1534.62039
[23] Miao W, Ding P and Geng Z 2015 Identifiability of normal and normal mixture models with nonignorable missing data. arXiv:1509.03860 · Zbl 1225.62058
[24] Kim J K 2009 Calibration estimation using empirical likelihood in survey sampling. Stat. Sinica 19(1): 145-157 · Zbl 1153.62006
[25] Kott, PS; Pfeffermann, D. (ed.); Rao, CR (ed.), Calibration weighting: Combining probability samples and linear prediction models, 55-82 (2009), Amsterdam · doi:10.1016/S0169-7161(09)00225-9
[26] Aronow P M, Gerber A S, Green D P and Kern H 2013 Double sampling for missing outcome data in randomized experiments. Typescript, Yale University · Zbl 1207.62152
[27] Karl A T, Yang Y and Lohr S L 2013 A correlated random effects model for nonignorable missing data in value-added assessment of teacher effects. J. Educ. Behav. Stat. 38(6): 557-603 · doi:10.3102/1076998613494819
[28] Pfeffermann, D.; Sverchkov, M.; Pfeffermann, D. (ed.); Rao, CR (ed.), Inference under informative sampling, 455-487 (2009), Amsterdam · doi:10.1016/S0169-7161(09)00239-9
[29] Liao K 2012 Statistical methods for non-ignorable missing data with applications to quality-of-life data. PhD thesis, University of Pennsylvania · Zbl 1471.62233
[30] Kim J K and Shao J 2013 Statistical methods for handling incomplete data. Chapman &Hall/CRC · Zbl 1276.62004
[31] Lu Z and Zhang Z 2014 Robust growth mixture models with non-ignorable missingness: Models, estimation, selection, and application. Comput. Stat. Data Anal.71: 220-240 · Zbl 1471.62129 · doi:10.1016/j.csda.2013.07.036
[32] Paiva T and Reiter J P 2015 Stop or continue data collection: A nonignorable missing data approach for continuous variables. arXiv: 2015. : 1511.02189 · Zbl 1468.62091
[33] Xie H, Qian Y and Qu L M 2011 A semiparametric approach for analyzing nonignorable missing data. Stat. Sinica 21: 1881-1899 · Zbl 1225.62058 · doi:10.5705/ss.2009.252
[34] Yin P and Shi J Q 2015 Simulation based sensitivity analysis for non-ignorable missing data. arxiv:1501.05788
[35] Nelwamondo F V and Marwala T 2008 Techniques for handling missing data: Applications to online condition monitoring. Int. J. Innov. Comp., Inform. Cont. 4(6): 1507-1526 · Zbl 1471.62189
[36] Azadeh S M, Asadzadeh R, Jafari-Marandi S, Nazari-Shirkouhi G, Khoshkhou B, Talebi S and Naghavi A 2013 Optimum estimation of missing values in randomized complete block design by genetic algorithm. Knowl. Based Syst. 37(1): 37-47 · doi:10.1016/j.knosys.2012.06.014
[37] Duma M 2013 Partial imputation of unseen records to improve classification using a hybrid multi-layered artificial immune system and genetic algorithm. Appl. Soft Comp. 13(12): 4461-4480 · doi:10.1016/j.asoc.2013.08.005
[38] DeviPriya R and Kuppuswami S 2014 Drawing inferences from clinical studies with missing values using genetic algorithm. Int. J. Bioinf. Res. Appl. 10(6): 613-627 · doi:10.1504/IJBRA.2014.065245
[39] DeviPriya R and Kuppuswami S 2015 A novel approach for imputation of missing continuous attribute values in databases using genetic algorithm. Int. J. Inform. Tech. Manag. 14(2/3):185-200 · doi:10.1504/IJITM.2015.068461
[40] Lobato F, Sales C, Araujo I, Tadaiesky V, Diaa L, Ramos L and Santana A 2015 Multi objective genetic algorithm for missing data imputation. Pattern Recogn. Lett. 68(P1): 126-131 · doi:10.1016/j.patrec.2015.08.023
[41] Celeux G, Forbes F, Robert C and Titterington D 2006 Deviance information criteria for missing data models. Bayes. Anal. 1(4): 651-674 · Zbl 1331.62329 · doi:10.1214/06-BA122
[42] Kruschke J K, Aguinis H and Joo H 2012 The time has come: Bayesian methods for data analysis in the organizational sciences. Organiz. Res. Methods 15(4): 722-752 · doi:10.1177/1094428112457829
[43] Lu Z L, Zhang Z and Lubke G 2011 Bayesian inference for growth mixture models with latent class dependent missing data. Multivar. Behav. Res. 46(4): 567-597 · doi:10.1080/00273171.2011.589261
[44] Epifanio G D 2006 A Pseudo Bayes approach for non-ignorable non-response in categorical survey data. Dip. Economi, Finanza e Stat., Technical Report, Univ. di Perugia
[45] Siddique J and Belin T R 2008 Using an approximate Bayesian bootstrap to multiply impute nonignorable missing data. Comput. Stat. Data Anal. 53(2): 405-415 · Zbl 1231.62037 · doi:10.1016/j.csda.2008.07.042
[46] Si Y 2012 Non-parametric Bayesian methods for multiple imputation of large scale incomplete categorical data in panel studies. PhD thesis, Duke University
[47] Asparouhov T and Muthen B 2010 Bayesian analysis of latent variable models using MPlus. Version 4. http://www.statmodel.com
[48] Lunn D, Jackson C, Best N, Thomas A and Spiegelhalter D 2013 The BUGS Book - A practical introduction to Bayesian analysis. Boca Raton, FL: CRC Press · Zbl 1281.62009
[49] Little R 2011 Calibrated Bayes, for statistics in general, and missing data in particular. Stat. Sci. 26(2): 162-174 · Zbl 1246.62054 · doi:10.1214/10-STS318
[50] Tanaka D and Kanazawa Y 2010 Bayesian analysis of the latent growth model with dropout. Discussion paper series, Department of Social Systems and Management, University of Tsukuba
[51] Mason A, Richardson S, Plewis I and Best N 2012 Strategy for modelling nonrandom missing data mechanisms in observational studies using Bayesian methods. J. Offic. Stat. 28(2): 279-302
[52] Janicki R and Malec D 2013 A Bayesian model averaging approach to analyzing categorical data with nonignorable nonresponse. Comput. Stat. Data Anal. 57: 600-614 · Zbl 1365.62477 · doi:10.1016/j.csda.2012.07.028
[53] Allen J 2015 A Bayesian Hierarchical selection model for academic growth with missing data. ACT Working Paper Series, WP-2015-04
[54] Zhu H, Ibrahim J G and Tang N 2014 Bayesian sensitivity analysis of statistical models with missing data. Stat. Sinica 24(2):871-896 · Zbl 1285.62029
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.