×

Discovering interactions using covariate informed random partition models. (English) Zbl 1475.62263

Summary: Combination chemotherapy treatment regimens created for patients diagnosed with childhood acute lymphoblastic leukemia have had great success in improving cure rates. Unfortunately, patients prescribed these types of treatment regimens have displayed susceptibility to the onset of osteonecrosis. Some have suggested that this is due to pharmacokinetic interaction between two agents in the treatment regimen (asparaginase and dexamethasone) and other physiological variables. Determining which physiological variables to consider when searching for interactions in scenarios like these, minus a priori guidance, has proved to be a challenging problem, particularly if interactions influence the response distribution in ways beyond shifts in expectation or dispersion only. In this paper we propose an exploratory technique that is able to discover associations between covariates and responses in a general way. The procedure connects covariates to responses flexibly through dependent random partition distributions and then employs machine learning techniques to highlight potential associations found in each cluster. We provide a simulation study to show utility and apply the method to data produced from a study dedicated to learning which physiological predictors influence severity of osteonecrosis multiplicatively.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62G05 Nonparametric estimation
62H12 Estimation in multivariate analysis

References:

[1] Agrawal, R., Mannila, H., Srikant, R., Toivonen, H. and Verkamo, A. I. (1996). Fast discovery of association rules. In Advances in Knowledge Discovery and Data Mining (U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy, eds.) 307-328. American Association for Artificial Intelligence, Menlo Park, CA, USA.
[2] Agrawal, R., Trippe, B., Huggins, J. and Broderick, T. (2019). The kernel interaction trick: Fast Bayesian discovery of pairwise interactions in high dimensions. In Proceedings of the 36th International Conference on Machine Learning (K. Chaudhuri and R. Salakhutdinov, eds.). Proceedings of Machine Learning Research 97 141-150. PMLR, Long Beach, CA.
[3] American Cancer Society (2018). Survival Rates for Childhood Leukemias. https://www.cancer.org/cancer/leukemia-in-children/detection-diagnosis-staging/survival-rates.html [Accessed: 18 March 2018].
[4] Bao, J. and Hanson, T. E. (2015). Bayesian nonparametric multivariate ordinal regression. Canad. J. Statist. 43 337-357. · Zbl 1321.62033 · doi:10.1002/cjs.11253
[5] Barcella, W., De Iorio, M., Favaro, S. and Rosner, G. L. (2018). Dependent generalized Dirichlet process priors for the analysis of acute lymphoblastic leukemia. Biostatistics 19 342-358. · doi:10.1093/biostatistics/kxx042
[6] Barrera-Gómez, J., Agier, L., Portengen, L., Chadeau-Hyam, M., Giorgis-Allemand, L., Siroux, V., Robinson, O., Vlaanderen, J., González, J. R. et al. (2017). A systematic comparison of statistical methods to detect interactions in exposome-health associations. Environ. Health 16 74.
[7] Berger, J. O., Wang, X. and Shen, L. (2014). A Bayesian approach to subgroup identification. J. Biopharm. Statist. 24 110-129. · doi:10.1080/10543406.2013.856026
[8] Bien, J., Taylor, J. and Tibshirani, R. (2013). A LASSO for hierarchical interactions. Ann. Statist. 41 1111-1141. · Zbl 1292.62109 · doi:10.1214/13-AOS1096
[9] Chen, T. and Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’16 785-794. Association for Computing Machinery, New York, NY, USA. · doi:10.1145/2939672.2939785
[10] Chen, Y. and Hanson, T. E. (2014). Bayesian nonparametric \(k\)-sample tests for censored and uncensored data. Comput. Statist. Data Anal. 71 335-346. · Zbl 1471.62041 · doi:10.1016/j.csda.2012.11.003
[11] Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., Chen, K., Mitchell, R., Cano, I. et al. (2019). xgboost: Extreme gradient boosting. R package version 0.90.0.2.
[12] Chung, Y. and Dunson, D. B. (2009). Nonparametric Bayes conditional distribution modeling with variable selection. J. Amer. Statist. Assoc. 104 1646-1660. · Zbl 1205.62039 · doi:10.1198/jasa.2009.tm08302
[13] Dahl, D. B. (2020). salso: Sequentially-allocated latent structure optimization. R package version 0.1.11.
[14] De Iorio, M., Müller, P., Rosner, G. L. and MacEachern, S. N. (2004). An ANOVA model for dependent random measures. J. Amer. Statist. Assoc. 99 205-215. · Zbl 1089.62513 · doi:10.1198/016214504000000205
[15] Du, J. and Linero, A. R. (2018). Interaction detection with Bayesian decision tree ensembles. arXiv:1809.08524.
[16] Dunson, D. B., Pillai, N. and Park, J.-H. (2007). Bayesian density regression. J. R. Stat. Soc. Ser. B. Stat. Methodol. 69 163-183. · Zbl 1120.62025 · doi:10.1111/j.1467-9868.2007.00582.x
[17] Fan, J., Yao, Q. and Tong, H. (1996). Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika 83 189-206. · Zbl 0865.62026 · doi:10.1093/biomet/83.1.189
[18] Ferrari, F. and Dunson, D. B. (2019). Bayesian factor analysis for inference on interactions. arXiv:1904.11603v1.
[19] Gabry, J., Simpson, D., Vehtari, A., Betancourt, M. and Gelman, A. (2019). Visualization in Bayesian workflow. J. Roy. Statist. Soc. Ser. A 182 389-402. · doi:10.1111/rssa.12378
[20] George, E. I.and McCulloch, R. E. (1997). Approaches for Bayesian variable selection. Statist. Sinica 7 339-374. · Zbl 0884.62031
[21] Hahsler, M., Buchta, C., Gruen, B. and Hornik, K. (2015). arules: Mining association rules and frequent itemsets. R package version 1.3-1.
[22] Han, J., Kamber, M. and Pei, J. (2012). Data Mining Concepts and Techniques, 1st ed. Elsevier. · Zbl 1230.68018
[23] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer Series in Statistics. Springer, New York. · Zbl 1273.62005 · doi:10.1007/978-0-387-84858-7
[24] Heba, I., Amany, A., Ahmed, S. E. and Amr, S. (2014). Novel data-mining methodologies for detecting drug-drug interactions: A review of pharmacovigilance literature. In Advances in Environmental Sciences, Development and Chemistry (W. L. Staff, ed.) 301-314. Wseas LLC.
[25] Henderson, N. C., Louis, T. A., Rosner, G. L. and Varadhan, R. (2020). Individualized treatment effects with censored data via fully nonparametric Bayesian accelerated failure time models. Biostatistics 21 50-68. · doi:10.1093/biostatistics/kxy028
[26] Hu, J., Joshi, A. and Johnson, V. E. (2009). Log-linear models for gene association. J. Amer. Statist. Assoc. 104 597-607. · Zbl 1388.62219 · doi:10.1198/jasa.2009.0025
[27] Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist. 33 730-773. · Zbl 1068.62079 · doi:10.1214/009053604000001147
[28] Ishwaran, H., Rao, J. S. and Kogalur, U. B. (2013). spikeslab: Prediction and variable selection using spike and slab regression. R package version 1.1.5.
[29] Kapelner, A. and Bleich, J. (2016). bartMachine: Machine learning with Bayesian additive regression trees. J. Stat. Softw. 70 4. · Zbl 1328.62243
[30] Karbowiak, E. and Biecek, P. (2019). EIX: Explain interactions in ’XGBoost’. R package version 1.0.
[31] Kawedia, J. D., Kaste, S. C., Pei, D., Panetta, J. C., Cai, X., Cheng, C., Neale, G., Howard, S. C., Evans, W. E. et al. (2011). Pharmacokinetic, pharmacodynamic, and pharmacogenetic determinants of osteonecrosis in children with acute lymphoblastic leukemia. Blood 117 2340-2347.
[32] Kottas, A., Müller, P. and Quintana, F. (2005). Nonparametric Bayesian modeling for multivariate ordinal data. J. Comput. Graph. Statist. 14 610-625. · doi:10.1198/106186005X63185
[33] Lim, M. and Hastie, T. (2015). Learning interactions via hierarchical group-lasso regularization. J. Comput. Graph. Statist. 24 627-654. · doi:10.1080/10618600.2014.938812
[34] Lim, M. and Hastie, T. (2019). glinternet: Learning interactions via hierarchical group-lasso regularization. R package version 1.0.10.
[35] Liu, J., Sivaganesan, S., Laud, P. W. and Müller, P. (2017). A Bayesian subgroup analysis using collections of ANOVA models. Biom. J. 59 746-766. · Zbl 1369.62297 · doi:10.1002/bimj.201600064
[36] MacEachern, S. N. (2000). Dependent Dirichlet processes. Technical Report, Ohio State Univ., Dept. Statistics.
[37] Mitra, R., Müller, P. and Ji, Y. (2017). Bayesian multiplicity control for multiple graphs. Canad. J. Statist. 45 44-61. · Zbl 1462.62365 · doi:10.1002/cjs.11305
[38] Müller, P., Quintana, F. and Rosner, G. L. (2011). A product partition model with regression on covariates. J. Comput. Graph. Statist. 20 260-278. Supplementary material available online. · doi:10.1198/jcgs.2011.09066
[39] Müller, P., Quintana, F. A., Jara, A. and Hanson, T. (2015). Bayesian Nonparametric Data Analysis. Springer Series in Statistics. Springer, Cham. · Zbl 1333.62003 · doi:10.1007/978-3-319-18968-0
[40] Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist. 9 249-265. · doi:10.2307/1390653
[41] Page, G. L. and Quintana, F. A. (2018). Calibrating covariate informed product partition models. Stat. Comput. 28 1009-1031. · Zbl 1405.62123 · doi:10.1007/s11222-017-9777-z
[42] Page, G. L., Quintana, F. A. and Rosner, G. L. (2021a). Supplement to “Discovering interactions using covariate informed random partition models.” https://doi.org/10.1214/20-AOAS1372SUPPA
[43] Page, G. L., Quintana, F. A. and Rosner, G. L. (2021b). Source code for “Discovering interactions using covariate informed random partition models.” https://doi.org/10.1214/20-AOAS1372SUPPB
[44] R Core Team (2017). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
[45] Reich, B. J., Kalendra, E., Storlie, C. B., Bondell, H. D. and Fuentes, M. (2012). Variable selection for high dimensional Bayesian density estimation: Application to human exposure simulation. J. R. Stat. Soc. Ser. C. Appl. Stat. 61 47-66. · doi:10.1111/j.1467-9876.2011.00772.x
[46] Schnell, P. M., Tang, Q., Offen, W. W. and Carlin, B. P. (2016). A Bayesian credible subgroups approach to identifying patient subgroups with positive treatment effects. Biometrics 72 1026-1036. · Zbl 1390.62306 · doi:10.1111/biom.12522
[47] Scott, J. G. and Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann. Statist. 38 2587-2619. · Zbl 1200.62020 · doi:10.1214/10-AOS792
[48] Shen, W. and Ghosal, S. (2016). Adaptive Bayesian density regression for high-dimensional data. Bernoulli 22 396-420. · Zbl 1388.62115 · doi:10.3150/14-BEJ663
[49] Simon, R. (2002). Bayesian subset analysis: Application to studyingtreatment-by-gender interactions. Stat. Med. 21 2909-2916.
[50] Smith, M. and Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. J. Econometrics 75 317-343. · Zbl 0864.62025
[51] Su, X., Peña, A. T., Liu, L. and Levine, R. A. (2018). Random forests of interaction trees for estimating individualized treatment effects in randomized trials. Stat. Med. 37 2547-2560. · doi:10.1002/sim.7660
[52] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[53] Tokdar, S. T., Zhu, Y. M. and Ghosh, J. K. (2010). Bayesian density regression with logistic Gaussian process and subspace projection. Bayesian Anal. 5 319-344. · Zbl 1330.62182 · doi:10.1214/10-BA605
[54] Varadhan, R. and Wang, S.-J. (2014). Standardization for subgroup analysis in randomized controlled trials. J. Biopharm. Statist. 24 154-167. · doi:10.1080/10543406.2013.856023
[55] Venables, W. N. and Ripley, B. D. (2002). Modern Applied Statistics with S, Fourth ed. Springer, New York. ISBN 0-387-95457-0. · Zbl 1006.62003 · doi:10.1007/978-1-4899-2819-1
[56] Wade, S. and Ghahramani, Z. (2018). Bayesian cluster analysis: Point estimation and credible balls (with discussion). Bayesian Anal. 13 559-626. With discussion and a reply by the authors · Zbl 1407.62241 · doi:10.1214/17-BA1073
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.