×

Prediction and inference with missing data in patient alert systems. (English) Zbl 1437.62613

Summary: We describe the Bedside Patient Rescue (BPR) project, the goal of which is risk prediction of adverse events for non-intensive care unit patients using \(\sim 100\) variables (vitals, lab results, assessments, etc.). There are several missing predictor values for most patients, which in the health sciences is the norm, rather than the exception. A Bayesian approach is presented that addresses many of the shortcomings to standard approaches to missing predictors: (i) treatment of the uncertainty due to imputation is straight-forward in the Bayesian paradigm, (ii) the predictor distribution is flexibly modeled as an infinite normal mixture with latent variables to explicitly account for discrete predictors (i.e., as in multivariate probit regression models), and (iii) certain missing not at random situations can be handled effectively by allowing the indicator of missingness into the predictor distribution only to inform the distribution of the missing variables. The proposed approach also has the benefit of providing a distribution for the prediction, including the uncertainty inherent in the imputation. Therefore, we can ask questions such as: is it possible this individual is at high risk but we are missing too much information to know for sure? How much would we reduce the uncertainty in our risk prediction by obtaining a particular missing value? This approach is applied to the BPR problem resulting in excellent predictive capability to identify deteriorating patients. for this article, including a standardized description of the materials available for reproducing the work, are available as an online supplement.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference
62D05 Sampling theory, sample surveys

Software:

missForest; JAGS; Stan

References:

[1] Allison, P. D., “Multiple Imputation for Missing Data: A Cautionary Tale,”, Sociological Methods & Research, 28, 301-309 (2000) · doi:10.1177/0049124100028003003
[2] Amemiya, T., “Tobit Models: A Survey,”, Journal of Econometrics, 24, 3-61 (1984) · Zbl 0539.62121 · doi:10.1016/0304-4076(84)90074-5
[3] Antonio, C.; David, B. D., “Bayesian Multivariate Mixed-Scale Density Estimation,”, Statistics and Its Interface, 8, 195-201 (2015) · Zbl 1405.62037
[4] Berger, J. O., Statistical Decision Theory and Bayesian Analysis (2013), New York: Springer Science & Business Media, New York
[5] Bernaards, C. A.; Belin, T. R.; Schafer, J. L., “Robustness of a Multivariate Normal Approximation for Imputation of Incomplete Binary Data,”, Statistics in Medicine, 26, 1368-1382 (2007) · doi:10.1002/sim.2619
[6] Bhattacharya, A.; Dunson, D. B., “Simplex Factor Models for Multivariate Unordered Categorical Data,”, Journal of the American Statistical Association, 107, 362-377 (2012) · Zbl 1263.62097 · doi:10.1080/01621459.2011.646934
[7] Buist, M. D.; Jarmolowski, E.; Burton, P. R.; Bernard, S. A.; Waxman, B. P.; Anderson, J., “Recognising Clinical Instability in Hospital Patients Before Cardiac Arrest or Unplanned Admission to Intensive Care. A Pilot Study in a Tertiary-Care Hospital,, The Medical Journal of Australia, 171, 22-25 (1999) · doi:10.5694/j.1326-5377.1999.tb123492.x
[8] Chib, S.; Greenberg, E., “Analysis of Multivariate Probit Models,”, Biometrika, 85, 347-361 (1998) · Zbl 0938.62020 · doi:10.1093/biomet/85.2.347
[9] Dunson, D. B., “Bayesian Latent Variable Models for Clustered Mixed Outcomes,”, Journal of the Royal Statistical Society, Series B, 62, 355-366 (2000) · doi:10.1111/1467-9868.00236
[10] Dunson, D. B.; Park, J.-H, “Kernel Stick-Breaking Processes,”, Biometrika, 95, 307-323 (2008) · Zbl 1437.62448 · doi:10.1093/biomet/asn012
[11] Efron, B., “The Efficiency of Cox’s Likelihood Function for Censored Data,”, Journal of the American statistical Association, 72, 557-565 (1977) · Zbl 0373.62020 · doi:10.1080/01621459.1977.10480613
[12] Ferguson, T. S., “A Bayesian Analysis of Some Nonparametric Problems,”, The Annals of Statistics, 1, 209-230 (1973) · Zbl 0255.62037 · doi:10.1214/aos/1176342360
[13] Finch, W. H., “Imputation Methods for Missing Categorical Questionnaire Data: A Comparison of Approaches,”, Journal of Data Science, 8, 361-378 (2010)
[14] Friedman, J., Greedy Function Approximation: A Gradient Boosting Machine, Annals of Statistics, 29, 1189-1232 (2001) · Zbl 1043.62034 · doi:10.1214/aos/1013203451
[15] Gebregziabher, M.; DeSantis, S. M., “Latent Class Based Multiple Imputation Approach for Missing Categorical Data,”, Journal of Statistical Planning and Inference, 140, 3252-3262 (2010) · Zbl 1204.62125 · doi:10.1016/j.jspi.2010.04.020
[16] Giudici, P.; Green, P., “Decomposable Graphical Gaussian Model Determination,”, Biometrika, 86, 785-801 (1999) · Zbl 0940.62019 · doi:10.1093/biomet/86.4.785
[17] Green, P. J., “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination,”, Biometrika, 82, 711-732 (1995) · Zbl 0861.62023 · doi:10.1093/biomet/82.4.711
[18] Green, P. J.; Hastie, D. I., “Reversible Jump MCMC,”, Genetics, 155, 1391-1403 (2009)
[19] Griffiths, J. R.; Kidney, E. M., “Current Use of Early Warning Scores in UK Emergency Departments,”, Emergency Medicine Journal, 29, 65-66 (2012) · doi:10.1136/emermed-2011-200508
[20] Gueorguieva, R. V.; Agresti, A., “A Correlated Probit Model for Joint Modeling of Clustered Binary and Continuous Responses,”, Journal of the American Statistical Association, 96, 1102-1112 (2001) · Zbl 1072.62612 · doi:10.1198/016214501753208762
[21] Heckman, J. J., “The Common Structure of Statistical Models of Truncation, Sample Selection and Limited Dependent Variables and a Simple Estimator for Such Models, Annals of Economic and Social Measurement, 5, 4, 475-492 (1976), Cambridge, MA: NBER, Cambridge, MA
[22] Helton, J.; Johnson, J.; Sallaberry, C.; Storlie, C., “Survey of Sampling-Based Methods for Uncertainty and Sensitivity Analysis,”, Reliability Engineering and System Safety, 91, 1175-1209 (2006) · doi:10.1016/j.ress.2005.11.017
[23] Horton, N. J.; Kleinman, K. P., “Much Ado About Nothing: A Comparison of Missing Data Methods and Software to Fit Incomplete Data Regression Models,”, The American Statistician, 61, 79-90 (2007) · doi:10.1198/000313007X172556
[24] Horton, N. J.; Lipsitz, S. R.; Parzen, M., “A Potential for Bias When Rounding in Multiple Imputation,”, The American Statistician, 57, 229-232 (2003) · Zbl 1182.62002 · doi:10.1198/0003130032314
[25] Imai, K.; van Dyk, D. A., “A Bayesian Analysis of the Multinomial Probit Model Using Marginal Data Augmentation,”, Journal of Econometrics, 124, 311-334 (2005) · Zbl 1335.62049 · doi:10.1016/j.jeconom.2004.02.002
[26] Ishwaran, H.; James, L. F., “Gibbs Sampling Methods for Stick-Breaking Priors,”, Journal of the American Statistical Association, 96 (2001) · Zbl 1014.62006 · doi:10.1198/016214501750332758
[27] Kim, S.; Tadesse, M. G.; Vannucci, M., “Variable Selection in Clustering via Dirichlet Process Mixture Models,”, Biometrika, 93, 877-893 (2006) · Zbl 1436.62266 · doi:10.1093/biomet/93.4.877
[28] Kirkland, L. L.; Malinchoc, M.; O’Byrne, M.; Benson, J. T.; Kashiwagi, D. T.; Burton, M. C.; Varkey, P.; Morgenthaler, T. I., “A Clinical Deterioration Prediction Tool for Internal Medicine Patients,”, American Journal of Medical Quality, 28, 135-142 (2013) · doi:10.1177/1062860612450459
[29] Kruschke, J., Doing Bayesian Data Analysis: A Tutorial With R, JAGS, and Stan (2014), New York: Academic Press, New York · Zbl 1300.62001
[30] Lesaffre, E.; Kaufmann, H., “Existence and Uniqueness of the Maximum Likelihood Estimator for a Multivariate Probit Model,”, Journal of the American Statistical Association, 87, 805-811 (1992) · Zbl 0850.62421 · doi:10.1080/01621459.1992.10475282
[31] Li, F., Yu, Y., and Rubin, D. B. (2012), “Imputing Missing Data by Fully Conditional Models: Some Cautionary Examples and Guidelines,” Duke University Department of Statistical Science Discussion Paper 11-24.
[32] Lipsitz, S. R.; Ibrahim, J. G., “A Conditional Model for Incomplete Covariates in Parametric Regression Models,”, Biometrika, 83, 916-922 (1996) · Zbl 0885.62026 · doi:10.1093/biomet/83.4.916
[33] McCulloch, R. E.; Polson, N. G.; Rossi, P. E., “A Bayesian Analysis of the Multinomial Probit Model With Fully Identified Parameters,”, Journal of Econometrics, 99, 173-193 (2000) · Zbl 0958.62029 · doi:10.1016/S0304-4076(00)00034-8
[34] Molenberghs, G.; Beunckens, C.; Sotto, C.; Kenward, M. G., “Every Missingness Not at Random Model Has a Missingness at Random Counterpart With Equal Fit,”, Journal of the Royal Statistical Society, Series B, 70, 371-388 (2008) · Zbl 1148.62046 · doi:10.1111/j.1467-9868.2007.00640.x
[35] Müller, P.; Erkanli, A.; West, M., “Bayesian Curve Fitting Using Multivariate Normal Mixtures,”, Biometrika, 83, 67-79 (1996) · Zbl 0865.62029
[36] Murray, J. S.; Reiter, J. P., “Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models With Local Dependence,”, Journal of the American Statistical Association, 111, 1466-1479 (2016) · doi:10.1080/01621459.2016.1174132
[37] Nobile, A., A Hybrid Markov Chain for the Bayesian Analysis of the Multinomial Probit Model, Statistics and Computing, 8, 229-242 (1998)
[38] Peberdy, M. A.; Kaye, W.; Ornato, J. P.; Larkin, G. L.; Nadkarni, V.; Mancini, M. E.; Berg, R. A.; Nichol, G.; Lane-Trultt, T.; NRCPR Investigators, Cardiopulmonary Resuscitation of Adults in the Hospital: A Report of 14 720 Cardiac Arrests From the National Registry of Cardiopulmonary Resuscitation, Resuscitation, 58, 297-308 (2003) · doi:10.1016/S0300-9572(03)00215-6
[39] Prytherch, D. R.; Smith, G. B.; Schmidt, P. E.; Featherstone, P. I., “ViEWS—Towards a National Early Warning Score for Detecting Adult Inpatient Deterioration,”, Resuscitation, 81, 932-937 (2010) · doi:10.1016/j.resuscitation.2010.04.014
[40] Raftery, A. E.; Dean, N., “Variable Selection for Model-Based Clustering,”, Journal of the American Statistical Association, 101, 168-178 (2006) · Zbl 1118.62339 · doi:10.1198/016214506000000113
[41] Raghunathan, T. E.; Lepkowski, J. M.; Van Hoewyk, J.; Solenberger, P., “A Multivariate Technique for Multiply Imputing Missing Values Using a Sequence of Regression Models,”, Survey Methodology, 27, 85-96 (2001)
[42] Romero-Brufau, S.; Huddleston, J. M.; Naessens, J. M.; Johnson, M. G.; Hickman, J.; Morlan, B. W.; Jensen, J. B.; Caples, S. M.; Elmer, J. L.; Schmidt, J. A.; Morgenthaler, T. I., “Widely Used Track and Trigger Scores: Are They Ready for Automation in Practice?,”, Resuscitation, 85, 549-552 (2014) · doi:10.1016/j.resuscitation.2013.12.017
[43] Roy, J.; Lum, K. J.; Daniels, M. J.; Zeldow, B.; Dworkin, J.; Re, V. L. III, Bayesian Nonparametric Generative Models for Causal Inference With Missing at Random Covariates, arXiv no. 1702.08496 (2017)
[44] Rubin, D. B., “Inference and Missing Data,”, Biometrika, 63, 581-592 (1976) · Zbl 0344.62034 · doi:10.1093/biomet/63.3.581
[45] Schafer, J. L., Analysis of Incomplete Multivariate Data (1997), Boca Raton, FL: CRC Press, Boca Raton, FL · Zbl 0997.62510
[46] Schein, R.; Hazday, N.; Pena, M.; Ruben, B. H.; Sprung, C. L., “Clinical Antecedents to In-Hospital Cardiopulmonary Arrest,”, Chest Journal, 98, 1388-1392 (1990) · doi:10.1378/chest.98.6.1388
[47] Sethuraman, J., “A Constructive Definition of Dirichlet Priors,”, Statistica Sinica, 4, 639-650 (1994) · Zbl 0823.62007
[48] Shahbaba, B.; Neal, R., “Nonlinear Models Using Dirichlet Process Mixtures,”, Journal of Machine Learning Research, 10, 1829-1850 (2009) · Zbl 1235.62069
[49] Si, Y.; Reiter, J. P., “Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys,”, Journal of Educational and Behavioral Statistics, 38, 499-521 (2013) · doi:10.3102/1076998613480394
[50] Smith, G. B.; Prytherch, D. R.; Meredith, P.; Schmidt, P. E.; Featherstone, P. I., “The Ability of the National Early Warning Score (NEWS) to Discriminate Patients at Risk of Early Cardiac Arrest, Unanticipated Intensive Care Unit Admission, and Death,”, Resuscitation, 84, 465-470 (2013) · doi:10.1016/j.resuscitation.2012.12.016
[51] Stekhoven, D. J.; Bühlmann, P., “Missforest: Non-Parametric Missing Value Imputation for Mixed-Type Data,”, Bioinformatics, 28, 112-118 (2012) · doi:10.1093/bioinformatics/btr597
[52] Storlie, C. B.; Michalak, S. E.; Quinn, H. M.; Dubois, A. J.; Wender, S. A.; Dubois, D. H., “A Bayesian Reliability Analysis of Neutron-Induced Errors in High Performance Computing Hardware,”, Journal of the American Statistical Association, 108, 429-440 (2013) · Zbl 06195950 · doi:10.1080/01621459.2013.770694
[53] Storlie, C. B.; Myers, S.; Colligan, R. C.; Weaver, A. L.; Voigt, R.; Croarkin, P. E.; Leibson, C. L.; Stoeckel, R. E.; Katusic, S. K.; Port, J. D., Model-Based Clustering With Mixed Continuous and Discrete Variables via Dirichlet Process Models, arXiv no. 1703.08741 (2017)
[54] Storlie, C. B.; Reich, B.; Helton, J.; Swiler, L.; Sallaberry, C., “Analysis of Computationally Demanding Models With Continuous and Categorical Inputs,”, Reliability Engineering and System Safety, 113, 30-41 (2013) · doi:10.1016/j.ress.2012.11.018
[55] Talhouk, A.; Doucet, A.; Murphy, K., “Efficient Bayesian Inference for Multivariate Probit Models With Sparse Inverse Correlation Matrices,”, Journal of Computational and Graphical Statistics, 21, 739-757 (2012) · doi:10.1080/10618600.2012.679239
[56] Van Buuren, S.; Brand, J. P.; Groothuis-Oudshoorn, C.; Rubin, D. B., “Fully Conditional Specification in Multivariate Imputation,”, Journal of Statistical Computation and Simulation, 76, 1049-1064 (2006) · Zbl 1144.62332 · doi:10.1080/10629360600810434
[57] Vermunt, J. K.; Van Ginkel, J. R.; Der Ark, V.; Andries, L.; Sijtsma, K., “Multiple Imputation of Incomplete Categorical Data Using Latent Class Analysis,”, Sociological Methodology, 38, 369-397 (2008) · doi:10.1111/j.1467-9531.2008.00202.x
[58] Wade, S.; Dunson, D. B.; Petrone, S.; Trippa, L., “Improving Prediction From Dirichlet Process Mixtures via Enrichment,”, The Journal of Machine Learning Research, 15, 1041-1071 (2014) · Zbl 1319.62085
[59] Wong, F.; Carter, C. K.; Kohn, R., “Efficient Estimation of Covariance Selection Models,”, Biometrika, 90, 809-830 (2003) · Zbl 1436.62346 · doi:10.1093/biomet/90.4.809
[60] Xu, D.; Daniels, M. J.; Winterstein, A. G., “Sequential BART for Imputation of Missing Covariates,”, Biostatistics, 17, 589-602 (2016) · doi:10.1093/biostatistics/kxw009
[61] Zhang, X.; Boscardin, W. J.; Belin, T. R., “Bayesian Analysis of Multivariate Nominal Measures Using Multivariate Multinomial Probit Models,”, Computational Statistics & Data Analysis, 52, 3697-3708 (2008) · Zbl 1452.62233 · doi:10.1016/j.csda.2007.12.012
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.