×

Regression trees and ensemble for multivariate outcomes. (English) Zbl 07683113

Summary: Tree-based methods have become one of the most flexible, intuitive, and powerful analytic tools for exploring complex data structures. The best documented, and arguably most popular uses of tree-based methods are in biomedical research, where multivariate outcomes occur commonly (e.g. diastolic and systolic blood pressure and nerve conduction measures in studies of neuropathy). Existing tree-based methods for multivariate outcomes do not appropriately take into account the correlation that exists in such data. In this paper, we develop goodness-of-split measures for building multivariate regression trees for continuous multivariate outcomes. We propose two general approaches: minimizing within-node homogeneity and maximizing between-node separation. Within-node homogeneity is measured using the average Mahalanobis distance and the determinant of the variance-covariance matrix. Between-node separation is measured using the Mahalanobis distance, Euclidean distance and standardized Euclidean distance. To enhance prediction accuracy we extend the single multivariate regression tree to an ensemble of multivariate trees. Extensive simulations are presented to examine the properties of our goodness-of-split measures. Finally, the proposed methods are illustrated using two clinical datasets of neuropathy and pediatric cardiac surgery.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62P10 Applications of statistics to biology and medical sciences; meta analysis
68W01 General topics in the theory of algorithms
Full Text: DOI

References:

[1] Banerjee, M., Reynolds, E., Andersson, H.B. and Nallamothu, B.K. (2019). Tree-Based Analysis: A Practical Approach to Create Clinical Decision-Making Tools. Circ Cardiovasc Qual Outcomes.
[2] Bharucha, NE; Bharucha, AE; Bharucha, EP, Prevalence of peripheral neuropathy in the Parsi community of Bombay, Neurology, 41, 1315-1317. 591-600 (1991) · doi:10.1212/WNL.41.8.1315
[3] Breiman, L., Bagging predictors, Mach. Learn., 24, 123-140 (1999) · Zbl 0858.68080 · doi:10.1007/BF00058655
[4] Breiman, L., Random forests, Mach. Learn., 45, 5-32 (2001) · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[5] Breiman, L.; Friedman, JH; Olshen, RA; Stone, CJ, Classification and Regression Trees (1984), Belmont: California, Wadsworth, Belmont · Zbl 0541.62042
[6] Cai, T.T., Ren, Z. and Zhou, H.H. (2016). Estimating structured high-dimensional covariance and precision matrices:Optimal rates and adaptive estimation. Electron. J. Stat. 10(1). doi:10.1214/15-EJS1081. · Zbl 1331.62272
[7] Callaghan, BC; Gao, L.; Li, Y.; Zhou, X.; Reynolds, E.; Banerjee, M.; Ji, L., Diabetes and obesity are the main metabolic drivers of peripheral neuropathy, Annal. Clin. Trans. Neurol., 5, 397-405 (2018) · doi:10.1002/acn3.531
[8] Callaghan, BC; Xia, R.; Banerjee, M.; de Rekeneire, N.; Harris, TB; Satterfield, S.; Schwartz, AV; Vinik, AI; Feldman, EL; Strotmeyer, ES, Metabolic syndrome components are associated with symptomatic polyneuropathy independent of glycemic status, Diabetes Care, 39, 801-807 (2016) · doi:10.2337/dc16-0081
[9] Callaghan, BC; Xia, R.; Reynolds, E.; Banerjee, M.; Burant, C.; Rothberg, A.; Pop-Busui, R.; Villegas-Umana, E.; Feldman, E., Better diagnostic accuracy of neuropathy in obesity: A new challenge for neurologists, Clinical Neurophysiolgy, 129, 654-662 (2018) · doi:10.1016/j.clinph.2018.01.003
[10] Callaghan, BC; Xia, R.; Reynolds, E.; Banerjee, M.; Rothberg, AE; Burant, CF, Association between metabolic syndrome components and polyneuropathy in an obese population, JAMA Neurol., 73, 1468-1476 (2016) · doi:10.1001/jamaneurol.2016.3745
[11] Cimino, JJ, Improving the electronic health record: getting what we wished for, J. Am. Med. Assoc., 309, 991-992 (2013) · doi:10.1001/jama.2013.890
[12] De’Ath, G., Multivariate regression trees a new technique for modeling Species-Environment relationships, Ecology, 83, 1105-1117 (2002)
[13] Deo, RC, Machine learning in medicine, Circulation, 132, 1920-1930 (2015) · doi:10.1161/CIRCULATIONAHA.115.001593
[14] Fan, J.; Liao, Y.; Liu, H., An overview of the estimation of large covariance and precision matrices, Econom. J., 19, C1-C32 (2016) · Zbl 1521.62083 · doi:10.1111/ectj.12061
[15] Gaies, M.; Cooper, DS; Tabbutt, S.; Schwartz, SM; Ghanayem, N.; Chanani, NK; Costello, JM; Thiagarajan, RR; Laussen, PC; Shekerdemian, LS; Donohue, JE; Willis, GM; Gaynor, JW; Jacobs, JP; Ohye, RG; Charpie, JR; Pasquali, SK; Scheurer, MA, Collaborative quality improvement in the cardiac intensive care unit: Development of the paediatric cardiac critical care consortium (PC4), Cardiol. Young, 25, 951-957 (2015) · doi:10.1017/S1047951114001450
[16] Gaies, M.; Donohue, JE; Willis, GM; Kennedy, AT; Butcher, J.; Scheurer, MA; Alten, JA; Gaynor, JW; Schuette, JJ; Cooper, DS; Jacobs, JP; Pasquali, SK; Tabbutt, S., Data integrity of the Pediatric Cardiac Critical Care Consortium (PC4) clinical registry, Cardiol. Young, 26, 1090-1096 (2016) · doi:10.1017/S1047951115001833
[17] Gaies, M.; Werho, DK; Zhang, W.; Donohue, JE; Tabbutt, S.; Ghanayem, NS; Scheurer, MA; Costello, JM; Gaynor, W.; Pasquali, SK; Dimick, JB; Banerjee, M.; Schwartz, SM, Duration of postoperative mechanical ventilation as a quality metric for pediatric cardiac surgical programs, Ann. Thorac. Surg., 105, 615-621 (2018) · doi:10.1016/j.athoracsur.2017.06.027
[18] Haque, M.; Sartelli, M.; McKimm, J.; Abu Bakar, M., Health care-associated infections - an overview, Infect Drug Resist, 11, 2321-2333 (2018) · doi:10.2147/IDR.S177247
[19] Hastie, T.; Tibshirani, R.; Friedman, J., The Elements of Statistical Learning (2001), New York: Springer, New York · Zbl 0973.62007 · doi:10.1007/978-0-387-21606-5
[20] Johnson, AEW; Ghassemi, MM; Nemati, S.; Niehaus, KE; Clifton, DA; Clifford, GD, Machine learning and decision support in critical care, Proc. IEEE, 104, 444-466 (2016) · doi:10.1109/JPROC.2015.2501978
[21] Khairat, S.; Coleman, GC; Russomagno, S.; Gotz, D., Assessing the status quo of EHR accessibility, usability, and knowledge dissemination, eGEMs: Generating Evidence & Methods to Improve Patient Outcomes, 6, 9 (2018) · doi:10.5334/egems.228
[22] Krassowski, M.; Das, V.; Sahu, SK; Misra, BB, State of the field in Multi-Omics research: From computational needs to data mining and sharing, Front Genet., 11, 610798 (2020) · doi:10.3389/fgene.2020.610798
[23] Lam, C. (2020). High-dimensional covariance matrix estimation. WIREs Comput Stat 12(2). doi:10.1002/wics.1485. · Zbl 07909786
[24] Larsen, D.; Speckman, PL, Multivariate regression trees for analysis of abundance data, Biometrics., 60, 543-549 (2004) · Zbl 1274.62807 · doi:10.1111/j.0006-341X.2004.00202.x
[25] LeBlanc, M.; Crowley, J., Survival trees by goodness of split, J. Am. Stat. Assoc., 88, 457-467 (1993) · Zbl 0773.62071 · doi:10.1080/01621459.1993.10476296
[26] Mahalanobis, P.C. (1936). On the Generalized Distance in Statistics. · Zbl 0015.03302
[27] Quinlan, J., Bagging, boosting, and C4.5. Proceedings Thirteenth American Association for Artificial Intelligence National Conference on Artificial Intelligence, 725-730 (1996), Menlo Park: AAAI Press, Menlo Park
[28] Reynolds, EL; Kerber, KA; Hill, C.; De Lott, LB; Magliocco, B.; Esper, GJ; Callaghan, BC, The effects of the Medicare NCS reimbursement policy: utilization, payments, and patient access, Neurology, 95, e930-e935 (2020) · doi:10.1212/WNL.0000000000010090
[29] Savettieri, G.; Rocca, WA; Salemi, G.; Meneghini, F.; Grigoletto, F.; Morgante, L.; Reggio, A.; Costa, V.; Coraci, MA; Di Perri, R., Prevalence of diabetic neuropathy with somatic symptoms: a door-to-door survey in two Sicilian municipalities, Neurology, 43, 1115-1120 (1993) · doi:10.1212/WNL.43.6.1115
[30] Segal, M.R. (1988). Regression trees for censored data. Biometrics 35-47. · Zbl 0707.62224
[31] Tabbutt, S., Schuette, J., Gaynor, J.W., Ghanayem, N., Jacobs, J.P., Alten, J.A., Dimick, J.B., Zhang, W., Donohue, J.E., Pasquali, S., Banerjee, M., Cooper, D. and Gaies, M.A. (2018). Novel model demonstrates variation in case mix adjusted mortality in pediatric cardiac intensive care units after cardiac surgery: a first step to disentangling surgical from CICU quality of care pediatric critical care medicine.
[32] Wilks, SS, Muldimensional Statistical Scatter. Collected Papers, Contributions to Mathematical Statistics, 597-614 (1967), New York: Wiley, New York · Zbl 0153.47601
[33] Zhang, H.; Singer, B., Recursive Partitioning in the Health Sciences (1999), New York: Springer, New York · Zbl 0920.62135 · doi:10.1007/978-1-4757-3027-2
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.