Skip to main content

Advertisement

Log in

Subgroup Identification with Classification and Regression Tree-Based Algorithms: an Application to the Ball State Adult Fitness Longitudinal Study

  • Published:
Bulletin of the Malaysian Mathematical Sciences Society Aims and scope Submit manuscript

Abstract

Cardiorespiratory fitness (CRF) is not only an objective measure of physical activity, but also a useful diagnostic and prognostic health indicator for patients in clinical settings. There is a well-established inverse relationship between cardiorespiratory fitness (CRF) and mortality. However, the effect of CRF on mortality status might be different on subgroups of individuals and could be higher or lower than the estimated average effect of CRF. Thus, the objective of the study is to identify subgroups with higher or lower impact of CRF on mortality status. In addition, we evaluate and compare both tree-based and non-tree-based algorithms for identifying predictive features and subgroups. A penalized logistic regression with least absolute shrinkage and selection operator (LASSO) penalty is performed to identify the features that may be associated with low CRF and all-cause mortality. The algorithms considered are: virtual twins classification (VT(C)), generalized unbiased interaction detection and estimation (GUIDE) classification (Gc), GUIDE sum (Gs), GUIDE interaction (Gi) to find subgroups of participants where CRF exerts positive or negative association with all-cause mortality from the Ball State Adult Fitness Longitudinal Lifestyle Study (BALL ST) data. The overall result suggests that tree-based (VT and GUIDE) methods naturally define subgroups with fewer predictors and the non-tree-based method (logistic-LASSO) fails to find subgroups, only identify predictors that have impact on mortality status. In terms of predictive variable selection and subgroup identification, Gi is the best method compared to other tree-based and non-tree-based algorithms. Our study identifies subgroups that may be benefited from higher CRF.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

Sample data available in https://github.com/m0sumy01/data-and-materials

Code Availability

R-codes are available in https://github.com/m0sumy01/r-code

References

  1. Lee, D.C., Artero, E.G., Sui, X., Blair, S.N.: Mortality trends in the general population: the importance of cardiorespiratory fitness. J. Psychopharmacol. (Oxford, England), 24(4 Suppl), 27–35

  2. Myers, J., Kaykha, A., George, S., et al.: Fitness versus physical activity patterns in predicting mortality in men. Am. J. Med. 117(12), 912–8 (2004)

    Article  Google Scholar 

  3. Blair, S.N.: Physical inactivity: the biggest public health problem of the 21st century. Br. J. Sports Med. 43(1), 1–2 (2009)

    Google Scholar 

  4. Kokkinos, P.F., Holland, J.C., Pittaras, A.F., et al.: Cardiorespiratory fitness and coronary heart disease risk factor association in women. J. Am. Coll. Cardiol. 26(2), 358–64 (1995)

    Article  Google Scholar 

  5. Robsahm, T.E., Falk, R.S., Heir, T., et al.: Measured cardiorespiratory fitness and self-reported physical activity: associations with cancer risk and death in a long-term prospective cohort study. Cancer Med. 5(8), 2136–2144 (2016)

    Article  Google Scholar 

  6. Myers, J., Prakash, M., Froelicher, V. et al.: Exercise capacity and mortality among men referred for exercise testing. N Engl J Med

  7. Harber, M.P., Kaminsky, L.A., Arena, R., et al.: Impact of cardiorespiratory fitness on all-cause and disease-specific mortality: advances since 2009. Prog. Cardiovasc. Dis. 60(1), 11–20 (2017)

    Article  Google Scholar 

  8. Lakoski, S.G., Willis, B.L., Barlow, C.E., et al.: Midlife cardiorespiratory fitness, incident cancer, and survival after cancer in men: the cooper center longitudinal study. JAMA Oncol. 1(2), 231–7 (2015)

    Article  Google Scholar 

  9. Reaven, G.: All obese individuals are not created equal: insulin resistance is the major determinant of cardiovascular disease in overweight/obese individuals. Diab. Vasc. Dis. Res. 2, 105–112 (2005)

    Article  Google Scholar 

  10. Parh, M.Y.A., Begum, M., et al.: Subgroup identification for differential cardio-respiratory fitness effect on cardiovascular disease risk factors: a model-based recursive partitioning approach. J. Stat. Res. 54(2), 147–165 (2020). https://doi.org/10.47302/jsr.2020540204

    Article  MathSciNet  Google Scholar 

  11. Hirashiki, A., Kondo, T., Okumura, T., et al.: Cardiopulmonary exercise testing as a tool for diagnosing pulmonary hypertension in patients with dilated cardiomyopathy. Ann. Noninvasive Electrocardiol. 21(3), 263–271 (2016). https://doi.org/10.1111/anec.12308

    Article  Google Scholar 

  12. Imboden, M.T., Harber, M.P., Whaley, M.H., et al.: Cardiorespiratory fitness and mortality in healthy men and women. J. Am. College Cardiol, 72(19), 2283-2292, 6, (2018)

  13. Seibold, H., Zeileis, A., Hothorn, T.: Model-based recursive partitioning for subgroup analyses. Int. J. Biostat. 12(1), 45–63 (2016)

    Article  MathSciNet  Google Scholar 

  14. Su, X., Zhou, T., Yan, X., Fan, J., Yang, S.: Interaction trees with censored survival data. J. Biostat.. 28;4(1): Article 2, (2008)

  15. Dusseldorp, E., Mechelen, I.V.: Qualitative interaction trees: a tool to identify qualitative treatment-subgroup interactions. Stat. Med., 06 August (2013)

  16. Foster, J.C., Taylor, J.M.G., Ruberg, S.J.: Subgroup identification from randomizedclinical trial data. Stat. Med. 30, 2867–2880 (2011). ([PubMed: 21815180])

    Article  MathSciNet  Google Scholar 

  17. Loh, W., Cao, L., Zhou, P.: Subgroup identification for precision medicine: A comparative review of 13 method. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 09 June (2019)

  18. Loh, W., He, X., Man, M.: A regression tree approach to identifying subgroups with differential treatment effects. Stat. Med. 34(11), 1818–1833 (2015)

    Article  MathSciNet  Google Scholar 

  19. Loh, W.Y., Zhou, P.: The GUIDE approach to subgroup identification. In: Ting N., Cappelleri J., Ho S., Chen G. (eds) Design and analysis of subgroups with biopharmaceutical applications. emerging topics in statistics and biostatistics. Springer, Cham

  20. James, G., Witten, D., Tibshirani, R., et al: An introduction to statistical learning, Springer Texts in Statistics, (2013)

  21. ACSM’s Guidelines for exercise testing and prescription”: American College of SportsMedicine, 10th edn., p. 120. Wolters Kluwer, Philadelphia, PA (2017)

  22. Whaley, M.H., Kaminsky, L.A., et al.: Predictors of over- and underachievement ofage-predicted maximal heart rate. Med. Sci. Sports Exerc. 24(10), 1173–9 (1992)

    Article  Google Scholar 

  23. Whaley, M.H., Kaminsky, L.A., et al.: Failure of predicted VO2peak to discriminatephysical fitness in epidemiological studies. Med. Sci. Sports Exerc. 27(1), 85–91 (1995)

    Article  Google Scholar 

  24. Singh, V.N.: The role of gas analysis with exercise testing. Office-Based Procedures: Part I. 28(1):159-79, (2001), vii-viii. https://doi.org/10.1016/s0095-4543(05)70012-9

  25. Bruce, R.A., Blackmon, J.R., et al.: Exercising testing in adult normal subjects and cardiac patients, Pediatrics, 32(1963), Suppl:742–56

  26. Kaminsky, L.A., Whaley, M.H.: Evaluation of a new standardized ramp protocol: the BSU/Bruce Ramp protocol. PubMed.gov 18(6), 438–44 (1998)

    Google Scholar 

  27. Pollock, M.L., Foster, C., Schmidt, D., et al.: Comparative analysis of physiologic responses to three different maximal graded exercise test protocols in healthy women. PubMed.gov 103(3), 363–73 (1982)

    Google Scholar 

  28. American College of Sports Medicine : ACSM’s guidelines for exercise testing and prescription . Philadelphia, PA: Wolters Kluwer (2017): 120

  29. Kaminsky, L.A., Arena, R., Myers, J.: Reference standards for cardiorespiratory fitness measured with cardiopulmonary exercise testing: data from the fitness registry and the importance of exercise national database. Mayo Clin. Proc. 90(11), 1515–23 (2015). https://doi.org/10.1016/j.mayocp.2015.07.026

    Article  Google Scholar 

  30. James, G., Witten, D., Hastie, T., Tibshirani, R.: (2013). An introduction to statistical learning. New York: Springer. Collett, D., Modelling survival data in medical research, Texts in Statistical Science,Third Edition, (2015)

  31. Chaudhuri, P., Lo, W.D., Loh, W.Y., Yang, C.C.: Generalized regression trees. Stat. Sin. 5, 641–666 (1995)

    MathSciNet  MATH  Google Scholar 

  32. Loh, W.Y.:“Regression tree models for designed experiments," . In: Rojo, J., editor. Second E. L. Lehmann Symposium. Vol. 49. Institute of mathematical statistics lecture notes-monograph, Series; 210-228.https://doi.org/10.1214/074921706000000464.

  33. Agresti, A.: An introduction to categorical data analysis," Wiley", 2nd edn, (2007)

  34. Loh, W.Y.: Improving the precision of classification trees. Ann. Appl. Stat. 3(4), 1710–1737 (2009)

    Article  MathSciNet  Google Scholar 

  35. WHO (2009) Global health risks: mortality and burden of disease attributable to selected major risks Geneva: World Health Organization

  36. Lee, D.C., Artero, E.G., Sui, X., Blair, S.N.: Cardiorespiratory fitness, body composition, and all cause and cardiovascular disease mortality in men. J. Psychopharmacol. 24(4 supplement), 27–35 (2010)

    Article  Google Scholar 

  37. Brien, S.E., Katzmarzyk, P.T., Craig, C.L., Gauvin, Lise: Physical activity, cardiorespiratory fitness and body mass index as predictors of substantial weight gain and obesity: the Canadian physical activity longitudinal study. Can. J. Publ. Health 98(2), 121–4 (2007)

    Article  Google Scholar 

  38. Wang, C.Y., Haskell, W.L., Farrell, S.W., et al.: Cardiorespiratory fitness levels among US adults 20–49 years of age: findings from the 1999–2004 national health and nutrition examination survey. Am. J. Epidemiol. 171(4), 426–35 (2010)

    Article  Google Scholar 

  39. Weltman, A., Weltman, J.Y., Winfield, D.D.W., et al.: Relationship between age, percentage body fat, fitness, and 24-hour growth hormone release in healthy young adults: effects of gender. J. Clin. Endocrinol. Metab. 93(12), 4711–4720 (2008)

    Article  Google Scholar 

  40. Centers for disease control and prevention, https://www.cdc.gov/cholesterol/facts.htm

  41. Anderson, Keaven M., Castelli, William P., et al.: Cholesterol and mortality 30 years of follow-up from the framingham study. JAMA 257(16), 2176–2180 (1987). https://doi.org/10.1001/jama.1987.03390160062027

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

SAS contributed to the design of the study, analyzed data and drafted the manuscript. MB contributed to design the study, interpreted results, and provided critical review of the manuscript. MYAP contributed to the conceptualization and reviewing the manuscript. MPH, BF, MW, FW, JP, and LK had full access to all data in our study. MPH and FW took the responsibility to interpret the data and checked the accuracy of the data analysis.

Corresponding author

Correspondence to Munni Begum.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Rafiqul I. Chowdhury.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sumy, M.S.A., Begum, M., Harber, M.P. et al. Subgroup Identification with Classification and Regression Tree-Based Algorithms: an Application to the Ball State Adult Fitness Longitudinal Study. Bull. Malays. Math. Sci. Soc. 45 (Suppl 1), 445–459 (2022). https://doi.org/10.1007/s40840-022-01328-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40840-022-01328-7

Keywords

Mathematics Subject Classification

Navigation