×

Bayesian modeling of interaction between features in sparse multivariate count data with application to microbiome study. (English) Zbl 07789362

Summary: Many statistical methods have been developed for the analysis of microbial community profiles, but due to the complexity of typical microbiome measurements, inference of interactions between microbial features remains challenging. We develop a Bayesian zero-inflated rounded log-normal kernel method to model interaction between microbial features in a community using multivariate count data in the presence of covariates and excess zeros. The model carefully constructs the interaction structure by imposing joint sparsity on the covariance matrix of the kernel and obtains a reliable estimate of the structure with a small sample size. The model also includes zero inflation to account for excess zeros observed in data and infers differential abundance of microbial features associated with covariates through log-linear regression. We provide simulation studies and real data analysis examples to demonstrate the developed model. Comparison of the model to a simpler model and popular alternatives in simulation studies shows that, in addition to an added and important insight on the feature interaction, it yields superior parameter estimates and model fit in various settings.

MSC:

62Pxx Applications of statistics
Full Text: DOI

References:

[1] AGARWAL, D. K., GELFAND, A. E. and CITRON-POUSTY, S. (2002). Zero-inflated models with application to spatial count data. Environ. Ecol. Stat. 9 341-355. Digital Object Identifier: 10.1023/A:1020910605990 Google Scholar: Lookup Link MathSciNet: MR1951713 · doi:10.1023/A:1020910605990
[2] ALAM, M. T., AMOS, G. C., MURPHY, A. R., MURCH, S., WELLINGTON, E. M. and ARASARADNAM, R. P. (2020). Microbial imbalance in inflammatory bowel disease patients at different taxonomic levels. Gut Pathogens 12 1-8.
[3] ANDRADE, J. C., ALMEIDA, D., DOMINGOS, M., SEABRA, C. L., MACHADO, D., FREITAS, A. C. and GOMES, A. M. (2020). Commensal obligate anaerobic bacteria and health: Production, storage, and delivery strategies. Front. Bioeng. Biotechnol. 8 550. Digital Object Identifier: 10.3389/fbioe.2020.00550 Google Scholar: Lookup Link · doi:10.3389/fbioe.2020.00550
[4] BASHAN, A., GIBSON, T. E., FRIEDMAN, J., CAREY, V. J., WEISS, S. T., HOHMANN, E. L. and LIU, Y.-Y. (2016). Universality of human microbial dynamics. Nature 534 259-262.
[5] Bhattacharya, A. and Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika 98 291-306. Digital Object Identifier: 10.1093/biomet/asr013 Google Scholar: Lookup Link MathSciNet: MR2806429 · Zbl 1215.62025 · doi:10.1093/biomet/asr013
[6] BHATTACHARYA, A., PATI, D., PILLAI, N. S. and DUNSON, D. B. (2015). Dirichlet-Laplace priors for optimal shrinkage. J. Amer. Statist. Assoc. 110 1479-1490. Digital Object Identifier: 10.1080/01621459.2014.960967 Google Scholar: Lookup Link MathSciNet: MR3449048 · Zbl 1373.62368 · doi:10.1080/01621459.2014.960967
[7] BIEN, J. and TIBSHIRANI, R. J. (2011). Sparse estimation of a covariance matrix. Biometrika 98 807-820. Digital Object Identifier: 10.1093/biomet/asr054 Google Scholar: Lookup Link MathSciNet: MR2860325 · Zbl 1228.62063 · doi:10.1093/biomet/asr054
[8] CAI, T., LIU, W. and LUO, X. (2011). A constrained \(\ell_1\) minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594-607. Digital Object Identifier: 10.1198/jasa.2011.tm10155 Google Scholar: Lookup Link MathSciNet: MR2847973 · Zbl 1232.62087 · doi:10.1198/jasa.2011.tm10155
[9] Cai, T., Ma, Z. and Wu, Y. (2015). Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Related Fields 161 781-815. Digital Object Identifier: 10.1007/s00440-014-0562-z Google Scholar: Lookup Link MathSciNet: MR3334281 · Zbl 1314.62130 · doi:10.1007/s00440-014-0562-z
[10] CAI, T. T., REN, Z. and ZHOU, H. H. (2016). Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation. Electron. J. Stat. 10 1-59. Digital Object Identifier: 10.1214/15-EJS1081 Google Scholar: Lookup Link MathSciNet: MR3466172 · Zbl 1331.62272 · doi:10.1214/15-EJS1081
[11] CAI, Z., ZHU, T., LIU, F., ZHUANG, Z. and ZHAO, L. (2021). Co-pathogens in periodontitis and inflammatory bowel disease. Frontiers in Medicine 8.
[12] CANALE, A. and DUNSON, D. B. (2011). Bayesian kernel mixtures for counts. J. Amer. Statist. Assoc. 106 1528-1539. Digital Object Identifier: 10.1198/jasa.2011.tm10552 Google Scholar: Lookup Link MathSciNet: MR2896854 · Zbl 1233.62041 · doi:10.1198/jasa.2011.tm10552
[13] CHATTOPADHYAY, S., ARNOLD, J. D., MALAYIL, L., HITTLE, L., MONGODIN, E. F., MARATHE, K. S., GOMEZ-LOBO, V. and SAPKOTA, A. R. (2021). Potential role of the skin and gut microbiota in premenarchal vulvar lichen sclerosus: A pilot case-control study. PLoS ONE 16 e0245243. Digital Object Identifier: 10.1371/journal.pone.0245243 Google Scholar: Lookup Link · doi:10.1371/journal.pone.0245243
[14] CONNOR, N., BARBERÁN, A. and CLAUSET, A. (2017). Using null models to infer microbial co-occurrence networks. PLoS ONE 12 e0176751. Digital Object Identifier: 10.1371/journal.pone.0176751 Google Scholar: Lookup Link · doi:10.1371/journal.pone.0176751
[15] FANG, H., HUANG, C., ZHAO, H. and DENG, M. (2015). CCLasso: Correlation inference for compositional data through Lasso. Bioinformatics 31 3172-3180.
[16] Faust, K., Sathirapongsasuti, J. F., Izard, J., Segata, N., Gevers, D., Raes, J. and Huttenhower, C. (2012). Microbial co-occurrence relationships in the human microbiome. PLoS Comput. Biol. 8 e1002606.
[17] FRIEDMAN, J. and ALM, E. J. (2012). Inferring correlation networks from genomic survey data. PLoS Comput. Biol. 8 e1002687. Digital Object Identifier: 10.1371/journal.pcbi.1002687 Google Scholar: Lookup Link · doi:10.1371/journal.pcbi.1002687
[18] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432-441. · Zbl 1143.62076
[19] GAO, C. and ZHOU, H. H. (2015). Rate-optimal posterior contraction for sparse PCA. Ann. Statist. 43 785-818. Digital Object Identifier: 10.1214/14-AOS1268 Google Scholar: Lookup Link MathSciNet: MR3325710 · Zbl 1312.62078 · doi:10.1214/14-AOS1268
[20] GRANTHAM, N. S., GUAN, Y., REICH, B. J., BORER, E. T. and GROSS, K. (2020). MIMIX: A Bayesian mixed-effects model for microbiome data from designed experiments. J. Amer. Statist. Assoc. 115 599-609. Digital Object Identifier: 10.1080/01621459.2019.1626242 Google Scholar: Lookup Link MathSciNet: MR4107660 · Zbl 1445.62267 · doi:10.1080/01621459.2019.1626242
[21] JIANG, S., XIAO, G., KOH, A. Y., KIM, J., LI, Q. and ZHAN, X. (2021). A Bayesian zero-inflated negative binomial regression model for the integrative analysis of microbiome data. Biostatistics 22 522-540. Digital Object Identifier: 10.1093/biostatistics/kxz050 Google Scholar: Lookup Link MathSciNet: MR4287166 · doi:10.1093/biostatistics/kxz050
[22] JOVEL, J., PATTERSON, J., WANG, W., HOTTE, N., O’KEEFE, S., MITCHEL, T., PERRY, T., KAO, D., MASON, A. L. et al. (2016). Characterization of the gut microbiome using 16S or shotgun metagenomics. Front. Microbiol. 7 459.
[23] KAAKOUSH, N. O. (2015). Insights into the role of erysipelotrichaceae in the human host. Front. Cell. Infect. Microbiol. 5 84. Digital Object Identifier: 10.3389/fcimb.2015.00084 Google Scholar: Lookup Link · doi:10.3389/fcimb.2015.00084
[24] KAMNEVA, O. K. (2017). Genome composition and phylogeny of microbes predict their co-occurrence in the environment. PLoS Comput. Biol. 13 e1005366. Digital Object Identifier: 10.1371/journal.pcbi.1005366 Google Scholar: Lookup Link · doi:10.1371/journal.pcbi.1005366
[25] KURTZ, Z. D., MÜLLER, C. L., MIRALDI, E. R., LITTMAN, D. R., BLASER, M. J. and BONNEAU, R. A. (2015). Sparse and compositionally robust inference of microbial ecological networks. PLoS Comput. Biol. 11 e1004226. Digital Object Identifier: 10.1371/journal.pcbi.1004226 Google Scholar: Lookup Link · doi:10.1371/journal.pcbi.1004226
[26] LEE, J. and SISON-MANGUS, M. (2018). A Bayesian semiparametric regression model for joint analysis of microbiome data. Front. Microbiol. 9 522. Digital Object Identifier: 10.3389/fmicb.2018.00522 Google Scholar: Lookup Link · doi:10.3389/fmicb.2018.00522
[27] LI, Q., GUINDANI, M., REICH, B. J., BONDELL, H. D. and VANNUCCI, M. (2017). A Bayesian mixture model for clustering and selection of feature occurrence rates under mean constraints. Stat. Anal. Data Min. 10 393-409. Digital Object Identifier: 10.1002/sam.11350 Google Scholar: Lookup Link MathSciNet: MR3733613 · Zbl 07260723 · doi:10.1002/sam.11350
[28] LLOYD-PRICE, J., ARZE, C., ANANTHAKRISHNAN, A. N., SCHIRMER, M., AVILA-PACHECO, J., POON, T. W., ANDREWS, E., AJAMI, N. J., BONHAM, K. S. et al. (2019). Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases. Nature 569 655-662.
[29] LO, C. and MARCULESCU, R. (2018). PGLasso: Microbial Community Detection through Phylogenetic Graphical Lasso. https://arxiv.org/abs/1807.08039v1.
[30] MA, S., REN, B., MALLICK, H., MOON, Y. S., SCHWAGER, E., MAHARJAN, S., TICKLE, T. L., LU, Y., CARMODY, R. N. et al. (2021). A statistical model for describing and simulating microbial community profiles. PLoS Comput. Biol. 17 e1008913.
[31] MAO, J., CHEN, Y. and MA, L. (2020). Bayesian graphical compositional regression for microbiome data. J. Amer. Statist. Assoc. 115 610-624. Digital Object Identifier: 10.1080/01621459.2019.1647212 Google Scholar: Lookup Link MathSciNet: MR4107661 · Zbl 1445.62281 · doi:10.1080/01621459.2019.1647212
[32] MIRSEPASI-LAURIDSEN, H. C., VALLANCE, B. A., KROGFELT, K. A. and PETERSEN, A. M. (2019). Clin. Microbiol. Rev. 32. Digital Object Identifier: 10.1128/CMR.00060-18 Google Scholar: Lookup Link · doi:10.1128/CMR.00060-18
[33] NITZAN, O., ELIAS, M., CHAZAN, B., RAZ, R. and SALIBA, W. (2013). Clostridium difficile and inflammatory bowel disease: Role in pathogenesis and implications in treatment. World J. Gastroenterol. 19 7577.
[34] PARADA VENEGAS, D. P., LA FUENTE, M. K. D., LANDSKRON, G., GONZÁLEZ, M. J., QUERA, R., DIJKSTRA, G., HARMSEN, H. J. M., FABER, K. N. and HERMOSO, M. A. (2019). Short chain fatty acids (SCFAs)-mediated gut epithelial and immune regulation and its relevance for inflammatory bowel diseases. Front. Immunol. 10 277. Digital Object Identifier: 10.3389/fimmu.2019.00277 Google Scholar: Lookup Link · doi:10.3389/fimmu.2019.00277
[35] PARK, J.-U., OH, B., LEE, J. P., CHOI, M.-H., LEE, M.-J. and KIM, B.-S. (2019). Influence of microbiota on diabetic foot wound in comparison with adjacent normal skin based on the clinical features. BioMed Research International 2019.
[36] Pati, D., Bhattacharya, A., Pillai, N. S. and Dunson, D. (2014). Posterior contraction in sparse Bayesian factor models for massive covariance matrices. Ann. Statist. 42 1102-1130. Digital Object Identifier: 10.1214/14-AOS1215 Google Scholar: Lookup Link MathSciNet: MR3210997 · Zbl 1305.62124 · doi:10.1214/14-AOS1215
[37] PAULSON, J. N., STINE, O. C., BRAVO, H. C. and POP, M. (2013). Differential abundance analysis for microbial marker-gene surveys. Nat. Methods 10 1200-1202.
[38] PROST, V., GAZUT, S. and BRÜLS, T. (2021). A zero inflated log-normal model for inference of sparse microbial association networks. PLoS Comput. Biol. 17 e1009089. Digital Object Identifier: 10.1371/journal.pcbi.1009089 Google Scholar: Lookup Link · doi:10.1371/journal.pcbi.1009089
[39] QIN, J., SHI, X., XU, J., YUAN, S., ZHENG, B., ZHANG, E., HUANG, G., LI, G., JIANG, G. et al. (2021). Characterization of the genitourinary microbiome of 1165 middle-aged and elderly healthy individuals. Front. Microbiol. 12.
[40] REN, B., BACALLADO, S., FAVARO, S., VATANEN, T., HUTTENHOWER, C. and TRIPPA, L. (2017). Bayesian nonparametric mixed effects models in microbiome data analysis. Preprint. Available at arXiv:1711.01241.
[41] Robinson, M. D., McCarthy, D. J. and Smyth, G. K. (2010). edgeR: A Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26 139-140.
[42] SCHWAGER, E., MALLICK, H., VENTZ, S. and HUTTENHOWER, C. (2017). A Bayesian method for detecting pairwise associations in compositional data. PLoS Comput. Biol. 13 e1005852. Digital Object Identifier: 10.1371/journal.pcbi.1005852 Google Scholar: Lookup Link · doi:10.1371/journal.pcbi.1005852
[43] SHULER, K., VERBANIC, S., CHEN, I. A. and LEE, J. (2021). A Bayesian nonparametric analysis for zero-inflated multivariate count data with application to microbiome study. J. R. Stat. Soc. Ser. C. Appl. Stat. 70 961-979. Digital Object Identifier: 10.1111/rssc.12493 Google Scholar: Lookup Link MathSciNet: MR4318016 · doi:10.1111/rssc.12493
[44] SOKOL, H., SEKSIK, P., FURET, J., FIRMESSE, O., NION-LARMURIER, I., BEAUGERIE, L., COSNES, J., CORTHIER, G., MARTEAU, P. et al. (2009). Low counts of Faecalibacterium prausnitzii in colitis microbiota. Inflamm. Bowel Dis. 15 1183-1189.
[45] TANG, Z.-Z. and CHEN, G. (2019). Zero-inflated generalized Dirichlet multinomial regression model for microbiome compositional data analysis. Biostatistics 20 698-713. Digital Object Identifier: 10.1093/biostatistics/kxy025 Google Scholar: Lookup Link MathSciNet: MR4019726 · doi:10.1093/biostatistics/kxy025
[46] VERBANIC, S., SHEN, Y., LEE, J., DEACON, J. M. and CHEN, I. A. (2020). Microbial predictors of healing and short-term effect of debridement on the microbiome of chronic wounds. NPJ Biofilms Microbiomes 6 1-11.
[47] VESTER-ANDERSEN, M., MIRSEPASI-LAURIDSEN, H., PROSBERG, M., MORTENSEN, C., TRÄGER, C., SKOVSEN, K., THORKILGAARD, T., NØJGAARD, C., VIND, I. et al. (2019). Increased abundance of proteobacteria in aggressive Crohn’s disease seven years after diagnosis. Sci. Rep. 9 1-10.
[48] WADSWORTH, W. D., ARGIENTO, R., GUINDANI, M., GALLOWAY-PENA, J., SHELBURNE, S. A. and VANNUCCI, M. (2017). An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data. BMC Bioinform. 18 1-12.
[49] WANG, Z., MAO, J. and MA, L. (2021). Logistic-tree normal model for microbiome compositions. Preprint. Available at arXiv:2106.15051.
[50] WANG, T. and ZHAO, H. (2017). A Dirichlet-tree multinomial regression model for associating dietary nutrients with gut microorganisms. Biometrics 73 792-801. Digital Object Identifier: 10.1111/biom.12654 Google Scholar: Lookup Link MathSciNet: MR3713113 · Zbl 1522.62251 · doi:10.1111/biom.12654
[51] WRZOSEK, L., MIQUEL, S., NOORDINE, M.-L., BOUET, S., CHEVALIER-CURT, M. J., ROBERT, V., PHILIPPE, C., BRIDONNEAU, C., CHERBUY, C. et al. (2013). Bacteroides thetaiotaomicron and Faecalibacterium prausnitzii influence the production of mucus glycans and the development of goblet cells in the colonic epithelium of a gnotobiotic model rodent. BMC Biol. 11 1-13.
[52] XIA, F., CHEN, J., FUNG, W. K. and LI, H. (2013). A logistic normal multinomial regression model for microbiome compositional data analysis. Biometrics 69 1053-1063. Digital Object Identifier: 10.1111/biom.12079 Google Scholar: Lookup Link MathSciNet: MR3146800 · Zbl 1288.62171 · doi:10.1111/biom.12079
[53] XIAOMING, W., JING, L., YUCHEN, P., HUILI, L., MIAO, Z. and JING, S. (2021). Characteristics of the vaginal microbiomes in prepubertal girls with and without vulvovaginitis. Eur. J. Clin. Microbiol. Infect. Dis. 40 1253-1261.
[54] XIE, F., XU, Y., PRIEBE, C. E. and CAPE, J. (2018). Bayesian estimation of sparse spiked covariance matrices in high dimensions. Preprint. Available at arXiv:1808.07433.
[55] ZHANG, X., MALLICK, H., TANG, Z., ZHANG, L., CUI, X., BENSON, A. K. and YI, N. (2017). Negative binomial mixed models for analyzing microbiome count data. BMC Bioinform. 18 1-10.
[56] ZHANG, S., SHEN, Y., CHEN, I. A. and LEE, J. (2023). Supplement to “Bayesian modeling of interaction between features in sparse multivariate count data with application to microbiome study.” https://doi.org/10.1214/22-AOAS1690SUPPA, https://doi.org/10.1214/22-AOAS1690SUPPB
[57] Zhao, S., Gao, C., Mukherjee, S. and Engelhardt, B. E. (2016). Bayesian group factor analysis with structured sparsity. J. Mach. Learn. Res. 17 Paper No. 196, 47. MathSciNet: MR3580349 · Zbl 1436.62233
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.