×

A zero-inflated beta-binomial model for microbiome data analysis. (English) Zbl 07851073

Summary: The Microbiome is increasingly recognized as an important aspect of the health of host species, involved in many biological pathways and processes and potentially useful as health biomarkers. Taking advantage of high-throughput sequencing technologies, modern bacterial microbiome studies are metagenomic, interrogating thousands of taxa simultaneously. Several data analysis frameworks have been proposed for microbiome sequence read count data and for determining the most significant features. However, there is still room for improvement. We introduce a zero-inflated beta-binomial to model the distribution of microbiome count data and to determine association with a continuous or categorical phenotype of interest. The approach can exploit the mean-variance relationship to improve power and adjust for covariates. The proposed method is a mixture model with two components: (i) a zero model accounting for excess zeros and (ii) a count model to capture the remaining component by beta-binomial regression, allowing for overdispersion effects. Simulation studies show that our proposed method effectively controls type I error and has higher power than competing methods to detect taxa associated with phenotype. An R package ZIBBSeqDiscovery is available on R CRAN.
{Copyright © 2018 John Wiley & Sons, Ltd.}

MSC:

62-XX Statistics
Full Text: DOI

References:

[1] Anders, S & Huber, W (2010), ‘Differential expression analysis for sequence count data’, Genome Biology, 11(R106).
[2] Bäckhed, F, Ley, RE, Sonnenburg, JL, Peterson, DA & Gordon, JI (2005), ‘Host‐bacterial mutualism in the human intestine’, Science, 307(5717), 1915-1920.
[3] Chen, J, Bittinger, K, Charlson, ES, Hoffmann, C, Lewis, J, Wu, GD, Collman, RG, Bushman, FD & Li, H (2012), ‘Associating microbiome composition with environmental covariates using generalized unifrac distances’, Bioinformatics, 28(16), 2106-2113.
[4] Cho, I & Blaser, MJ (2012), ‘The human microbiome: At the interface of health and disease’, Nature Reviews Genetics, 13(4), 260-270.
[5] Clemente, JC, Ursell, LK, Parfrey, LW & Knight, R (2012), ‘The impact of the gut microbiota on human health: An integrative view’, Cell, 148(6), 1258-1270.
[6] Consortium, HMP. 2012), ‘Structure, function and diversity of the healthy human microbiome’, Nature, 486(7402), 207-214.
[7] Fang, R, Wagner, B, Harris, JK & Fillon, SA (2014. Application of zero‐inflated negative binomial mixed model to human microbiota sequence data. In Tech. rep., PeerJ PrePrints.
[8] Kostic, AD, Gevers, D, Pedamallu, CS, Michaud, M, Duke, F, Earl, AM, Ojesina, AI, Jung, J, Bass, AJ, Tabernero, J, Baselga, J, Liu, C, Shivdasani, RA, Ogino, S, Birren, BW, Huttenhower, C, Garrett, WS & Meyerson, M (2012), ‘Genomic analysis identifies association of fusobacterium with colorectal carcinoma’, Genome research, 22(2), 292-298.
[9] Lindsay, BG (1988), ‘Composite likelihood methods’, Contemporary mathematics, 80(1), 220-239. · Zbl 0672.62069
[10] Lozupone, C & Knight, R (2005), ‘UniFrac: A new phylogenetic method for comparing microbial communities’, Applied and environmental microbiology, 71(12), 8228-8235.
[11] Macklaim, JM, Fernandes, AD, Di Bella, JM, Hammond, JA, Reid, G & Gloor, GB (2013), ‘Comparative meta‐RNA‐Seq of the vaginal microbiota and differential expression by Lactobacillus iners in health and dysbiosis’, Microbiome, 1(1), 1.
[12] McArdle, BH & Anderson, MJ (2001), ‘Fitting multivariate models to community data: A comment on distance‐based redundancy analysis’, Ecology, 82(1), 290-297.
[13] McMurdie, PJ & Holmes, S (2014), ‘Waste not, want not: Why rarefying microbiome data is inadmissible’, PLoS Comput Biol, 10(4), e1003531.
[14] Paulson, JN, Stine, OC, Bravo, HC & Pop, M (2013), ‘Differential abundance analysis for microbial marker‐gene surveys’, Nature methods, 10(12), 1200-1202.
[15] Qin, J, Li, Y, Cai, Z, Li, S, Zhu, J, Zhang, F, Liang, S, Zhang, W, Guan, Y, Shen, D, Peng, Y, Zhang, D, Jie, Z, Wu, W, Qin, Y, Xue, W, Li, J, Han, L, Lu, D, Wu, P, Dai, Y, Sun, X, Li, Z, Tang, A, Zhong, S, Li, X, Chen, W, Xu, R, Wang, M, Feng, Q, Gong, M, Yu, J, Zhang, Y, Zhang, M, Hansen, T, Sanchez, G, Raes, J, Falony, G, Okuda, S, Almeida, M, LeChatelier, E, Renault, P, Pons, N, Batto, J‐M, Zhang, Z, Chen, H, Yang, R, Zheng, W, Li, S, Yang, H, Wang, J, Ehrlich, DS, Nielsen, R, Pederson, O, Kristiansen, K & Wang, J (2012), ‘A metagenome‐wide association study of gut microbiota in type 2 diabetes’, Nature, 490(7418), 55-60.
[16] Ravel, J, Brotman, RM, Gajer, P, Ma, B, Nandy, M, Fadrosh, DW, Sakamoto, J, Koenig, SS, Fu, L, Zhou, X, Hickey, RJ, Schwebke, JR & Forney, LJ (2013), ‘Daily temporal dynamics of vaginal microbiota before, during and after episodes of bacterial vaginosis’, Microbiome, 1(1), 1.
[17] Riesenfeld, CS, Schloss, PD & Handelsman, J (2004), ‘Metagenomics: Genomic analysis of microbial communities’, Annual Review of Genetics, 38: 525-552.
[18] Robinson, MD, McCarthy, DJ & Smyth, GK (2010), ‘edger: A bioconductor package for differential expression analysis of digital gene expression data’, Bioinformatics, 26(1), 139-140.
[19] Self, SG & Liang, KY (1987), ‘Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions’, Journal of the American Statistical Association, 82(398), 605-610. · Zbl 0639.62020
[20] Turnbaugh, PJ, Hamady, M, Yatsunenko, T, Cantarel, BL, Duncan, A, Ley, RE, Sogin, ML, Jones, WJ, Roe, BA, Affourtit, JP, Egholm, M, Henrissat, B, Heath, AC, Knight, R & Gordon, JI (2009), ‘A core gut microbiome in obese and lean twins’, nature, 457(7228), 480-484.
[21] Weiss, SJ, Xu, Z, Amir, A, Peddada, S, Bittinger, K, Gonzalez, A, Lozupone, C, Zaneveld, JR, Vazquez‐Baeza, Y, Birmingham, A & Knight, R (2015. Effects of library size variance, sparsity, and compositionality on the analysis of microbiome data. In Tech. rep., PeerJ PrePrints.
[22] Whitman, WB, Coleman, DC & Wiebe, WJ (1998), ‘Prokaryotes: The unseen majority’, Proceedings of the National Academy of Sciences, 95(12), 6578-6583.
[23] Xia, F, Chen, J, Fung, WK & Li, H (2013), ‘A logistic normal multinomial regression model for microbiome compositional data analysis’, Biometrics, 69(4), 1053-1063. · Zbl 1288.62171
[24] Xu, L, Paterson, AD, Turpin, W & Xu, W (2015), ‘Assessment and selection of competing models for zero‐inflated microbiome data’, PloS one, 10(7), e0129606.
[25] Zhao, N, Chen, J, Carroll, IM, Ringel‐Kulka, T, Epstein, MP, Zhou, H, Zhou, JJ, Ringel, Y, Li, H & Wu, MC (2015), ‘Testing in microbiome‐profiling studies with MiRKAT, the microbiome regression‐based kernel association test’, The American Journal of Human Genetics, 96(5), 797-807.
[26] Zhou, YH & Wright, FA (2015), ‘Hypothesis testing at the extremes: Fast and robust association for high‐throughput data’, Biostatistics, 16(3), 611-625. https://doi.org/10.1093/biostatistics/kxv007 · doi:10.1093/biostatistics/kxv007
[27] Zhou, J, Wu, L, Deng, Y, Zhi, X, Jiang, YH, Tu, Q, Xie, J, Van Nostrand, JD, He, Z & Yang, Y (2011a), ‘Reproducibility and quantitation of amplicon sequencing‐based detection’, The ISME journal, 5(8), 1303-1313.
[28] Zhou, YH, Xia, K & Wright, FA (2011b), ‘A powerful and flexible approach to the analysis of RNA sequence count data’, Bioinformatics, 27(19), 2672-2678.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.