Skip to main content

Advertisement

Log in

Regression Trees and Ensemble for Multivariate Outcomes

  • Published:
Sankhya B Aims and scope Submit manuscript

Abstract

Tree-based methods have become one of the most flexible, intuitive, and powerful analytic tools for exploring complex data structures. The best documented, and arguably most popular uses of tree-based methods are in biomedical research, where multivariate outcomes occur commonly (e.g. diastolic and systolic blood pressure and nerve conduction measures in studies of neuropathy). Existing tree-based methods for multivariate outcomes do not appropriately take into account the correlation that exists in such data. In this paper, we develop goodness-of-split measures for building multivariate regression trees for continuous multivariate outcomes. We propose two general approaches: minimizing within-node homogeneity and maximizing between-node separation. Within-node homogeneity is measured using the average Mahalanobis distance and the determinant of the variance-covariance matrix. Between-node separation is measured using the Mahalanobis distance, Euclidean distance and standardized Euclidean distance. To enhance prediction accuracy we extend the single multivariate regression tree to an ensemble of multivariate trees. Extensive simulations are presented to examine the properties of our goodness-of-split measures. Finally, the proposed methods are illustrated using two clinical datasets of neuropathy and pediatric cardiac surgery.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13

Similar content being viewed by others

Data Availability

Study datasets are available from the corresponding author on reasonable request.

References

  • Banerjee, M., Reynolds, E., Andersson, H.B. and Nallamothu, B.K. (2019). Tree-Based Analysis: A Practical Approach to Create Clinical Decision-Making Tools. Circ Cardiovasc Qual Outcomes.

  • Bharucha, N.E., Bharucha, A.E. and Bharucha, E.P. (1991). Prevalence of peripheral neuropathy in the Parsi community of Bombay. Neurology 41, 1315–1317. 591–600.

    Article  Google Scholar 

  • Breiman, L. (1999). Bagging predictors. Mach. Learn. 24, 123–140.

    Article  MATH  Google Scholar 

  • Breiman, L. (2001). Random forests. Mach. Learn. 45, 5–32.

    Article  MATH  Google Scholar 

  • Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J. (1984). Classification and Regression Trees. California, Wadsworth, Belmont.

    MATH  Google Scholar 

  • Cai, T.T., Ren, Z. and Zhou, H.H. (2016). Estimating structured high-dimensional covariance and precision matrices:Optimal rates and adaptive estimation. Electron. J. Stat. 10(1). https://doi.org/10.1214/15-EJS1081.

  • Callaghan, B.C., Gao, L., Li, Y., Zhou, X., Reynolds, E., Banerjee, M. and Ji, L. (2018). Diabetes and obesity are the main metabolic drivers of peripheral neuropathy. Annal. Clin. Trans. Neurol. 5, 397–405.

    Article  Google Scholar 

  • Callaghan, B.C., Xia, R., Banerjee, M., de Rekeneire, N., Harris, T.B., Satterfield, S., Schwartz, A.V., Vinik, A.I., Feldman, E.L. and Strotmeyer, E.S. (2016). Metabolic syndrome components are associated with symptomatic polyneuropathy independent of glycemic status. Diabetes Care 39, 801–807.

    Article  Google Scholar 

  • Callaghan, B.C., Xia, R., Reynolds, E., Banerjee, M., Burant, C., Rothberg, A., Pop-Busui, R., Villegas-Umana, E. and Feldman, E. (2018). Better diagnostic accuracy of neuropathy in obesity: A new challenge for neurologists. Clinical Neurophysiolgy 129, 654–662.

    Article  Google Scholar 

  • Callaghan, B.C., Xia, R., Reynolds, E., Banerjee, M., Rothberg, A.E. and Burant, C.F. (2016). Association between metabolic syndrome components and polyneuropathy in an obese population. JAMA Neurol. 73, 1468–1476.

    Article  Google Scholar 

  • Cimino, J.J. (2013). Improving the electronic health record: getting what we wished for. J. Am. Med. Assoc. 309, 991–992.

    Article  Google Scholar 

  • De’Ath, G. (2002). Multivariate regression trees a new technique for modeling Species-Environment relationships. Ecology 83, 1105–1117.

    Google Scholar 

  • Deo, R.C. (2015). Machine learning in medicine. Circulation 132, 1920–1930.

    Article  Google Scholar 

  • Fan, J., Liao, Y. and Liu, H. (2016). An overview of the estimation of large covariance and precision matrices. Econom. J. 19, C1–C32. http://doi.org/10.1111/ectj.12061.

    Article  MathSciNet  MATH  Google Scholar 

  • Gaies, M., Cooper, D.S., Tabbutt, S., Schwartz, S.M., Ghanayem, N., Chanani, N.K., Costello, J.M., Thiagarajan, R.R., Laussen, P.C., Shekerdemian, L.S., Donohue, J.E., Willis, G.M., Gaynor, J.W., Jacobs, J.P., Ohye, R.G., Charpie, J.R., Pasquali, S.K. and Scheurer, M.A. (2015). Collaborative quality improvement in the cardiac intensive care unit: Development of the paediatric cardiac critical care consortium (PC4). Cardiol. Young 25, 951–957.

    Article  Google Scholar 

  • Gaies, M., Donohue, J.E., Willis, G.M., Kennedy, A.T., Butcher, J., Scheurer, M.A., Alten, J.A., Gaynor, J.W., Schuette, J.J., Cooper, D.S., Jacobs, J.P., Pasquali, S.K. and Tabbutt, S. (2016). Data integrity of the Pediatric Cardiac Critical Care Consortium (PC4) clinical registry. Cardiol. Young 26, 1090–1096.

    Article  Google Scholar 

  • Gaies, M., Werho, D.K., Zhang, W., Donohue, J.E., Tabbutt, S., Ghanayem, N.S., Scheurer, M.A., Costello, J.M., Gaynor, W., Pasquali, S.K., Dimick, J.B., Banerjee, M. and Schwartz, S.M. (2018). Duration of postoperative mechanical ventilation as a quality metric for pediatric cardiac surgical programs. Ann. Thorac. Surg. 105, 615–621.

    Article  Google Scholar 

  • Haque, M., Sartelli, M., McKimm, J. and Abu Bakar, M. (2018). Health care-associated infections - an overview. Infect Drug Resist 11, 2321–2333.

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. Springer, New York.

    Book  MATH  Google Scholar 

  • Johnson, A.E.W., Ghassemi, M.M., Nemati, S., Niehaus, K.E., Clifton, D.A. and Clifford, G.D. (2016). Machine learning and decision support in critical care. Proc. IEEE 104, 444–466.

    Article  Google Scholar 

  • Khairat, S., Coleman, G.C., Russomagno, S. and Gotz, D. (2018). Assessing the status quo of EHR accessibility, usability, and knowledge dissemination. eGEMs: Generating Evidence & Methods to Improve Patient Outcomes 6, 9.

    Article  Google Scholar 

  • Krassowski, M., Das, V., Sahu, S.K. and Misra, B.B. (2020). State of the field in Multi-Omics research: From computational needs to data mining and sharing. Front Genet. 11, 610798. http://doi.org/10.3389/fgene.2020.610798.

    Article  Google Scholar 

  • Lam, C. (2020). High-dimensional covariance matrix estimation. WIREs Comput Stat 12(2). http://doi.org/10.1002/wics.1485.

  • Larsen, D. and Speckman, P.L. (2004). Multivariate regression trees for analysis of abundance data. Biometrics. 60, 543–549.

    Article  MathSciNet  MATH  Google Scholar 

  • LeBlanc, M. and Crowley, J. (1993). Survival trees by goodness of split. J. Am. Stat. Assoc. 88, 457–467.

    Article  MathSciNet  MATH  Google Scholar 

  • Mahalanobis, P.C. (1936). On the Generalized Distance in Statistics.

  • Quinlan, J. (1996). Bagging, boosting, and C4.5. Proceedings Thirteenth American Association for Artificial Intelligence National Conference on Artificial Intelligence. AAAI Press, Menlo Park, p. 725–730.

    Google Scholar 

  • Reynolds, E.L., Kerber, K.A., Hill, C., De Lott, L.B., Magliocco, B., Esper, G.J. and Callaghan, B.C. (2020). The effects of the Medicare NCS reimbursement policy: utilization, payments, and patient access. Neurology 95, e930–e935.

    Article  Google Scholar 

  • Savettieri, G., Rocca, W.A., Salemi, G., Meneghini, F., Grigoletto, F., Morgante, L., Reggio, A., Costa, V., Coraci, M.A. and Di Perri, R. (1993). Prevalence of diabetic neuropathy with somatic symptoms: a door-to-door survey in two Sicilian municipalities. Neurology 43, 1115–1120.

    Article  Google Scholar 

  • Segal, M.R. (1988). Regression trees for censored data. Biometrics 35–47.

  • Tabbutt, S., Schuette, J., Gaynor, J.W., Ghanayem, N., Jacobs, J.P., Alten, J.A., Dimick, J.B., Zhang, W., Donohue, J.E., Pasquali, S., Banerjee, M., Cooper, D. and Gaies, M.A. (2018). Novel model demonstrates variation in case mix adjusted mortality in pediatric cardiac intensive care units after cardiac surgery: a first step to disentangling surgical from CICU quality of care pediatric critical care medicine.

  • Wilks, S.S. (1967). Muldimensional Statistical Scatter. Collected Papers, Contributions to Mathematical Statistics. Wiley, New York, Anderson, T. W. (ed.), p. 597–614.

    Google Scholar 

  • Zhang, H. and Singer, B. (1999). Recursive Partitioning in the Health Sciences. Springer, New York.

    Book  MATH  Google Scholar 

Download references

Acknowledgments

None.

Funding

Dr. Reynolds is supported by NIH K99DK129785. Dr. Banerjee is supported by NIH R21CA152775.

Author information

Authors and Affiliations

Authors

Contributions

Dr. Reynolds developed the methodological approach, performed and interpreted results from the simulation study and illustrative examples, and wrote the manuscript. Dr. Callaghan was integrally involved in interpretation of the data, and critical revisions of the manuscript. Dr. Gaeis was integrally involved in interpretation of the data, and critical revisions of the manuscript. Dr. Banerjee developed the methodological approach, interpreted results from the simulation study and illustrative examples, and wrote the manuscript.

Corresponding author

Correspondence to Evan L. Reynolds.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reynolds, E.L., Callaghan, B.C., Gaies, M. et al. Regression Trees and Ensemble for Multivariate Outcomes. Sankhya B 85, 77–109 (2023). https://doi.org/10.1007/s13571-023-00301-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13571-023-00301-z

Keywords

PACS

Navigation