×

A gene-by-gene multiple comparison analysis: a predictive Bayesian approach. (English) Zbl 1329.62436

Summary: In this paper, we propose a hierarchical Bayesian framework with a prior Dirichlet process for gene-by-gene multiple comparison analysis. The comparison among experimental conditions are made using the posterior probability for hypothesis of equality or inequality. To calculate the posterior probabilities, we use the Polya urn scheme through latent variables and the Bayes factor. The performance of the proposed method, as well as a comparison with usual Tukey-test, are evaluated on artificial data and on a shotgun proteomics data set. The results reveal a better performance of the proposed methodology in identification of difference of means and/or variance.

MSC:

62P10 Applications of statistics to biology and medical sciences; meta analysis
62F15 Bayesian inference
62J15 Paired and multiple comparisons; multiple testing
92D10 Genetics and epigenetics

Software:

vsn

References:

[1] Antoniak, C. E. (1974). Mixture of processes Dirichlet with applications to Bayesian nonparametric problems. The Annals of Statistics 2 , 1152-1174. · Zbl 0335.60034 · doi:10.1214/aos/1176342871
[2] Arfin, S. M., Long, A. D., Ito, E. T., Tolleri, L., Riehle, M. M., Paegle, E. S. and Hatfield, G. W. (2000). Global gene expression profiling in Escherichia coli K12: The effects of integration host factor. The Journal of Biological Chemistry 275 , 29672-29684.
[3] Baldi, P. and Long, D. A. (2001). A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inferences of gene changes. Bioinformatics 17 , 509-519.
[4] Bhattacharya, S. (2008). Gibbs sampling based Bayesian analysis of mixtures with unknown number of components. Sankhyā 70 , 133-155. · Zbl 1192.62073
[5] Blackwell, D. and MacQueen, J. B. (1973). Ferguson distribution via Polya urn schemes. The Annals of Statistics 1 , 353-355. · Zbl 0276.62010 · doi:10.1214/aos/1176342372
[6] Bolstad, B. M., Irizarry, R. A., Astrand, M. and Speed, T. P. (2003). A comparison of normalization methods for high dendity oligonucleotide array data based on variance and bias. Bioinformatics 19 , 185-193.
[7] Casella, G., Robert, C. and Wells, M. (2000). Mixture models, latent variables and partitioned importance sampling. Statistical Methodology 1 , 1-18. · Zbl 1075.65016 · doi:10.1016/j.stamet.2004.05.001
[8] Chen, J. J., Delongchamp, R. R., Tsai, C.-A., Hsueh, H.-m., Sistare, F., Thompson, K. L., Desai, V. G. and Fuscoe, J. C. (2004). Analysis of variance components in gene expression data. Bioinformatics 20 , 1436-1446.
[9] Cox, D. R. and Reid, N. M. (2000). The Theory of Design of Experiments . London: Chapman & Hall/CRC. · Zbl 1009.62061
[10] DeRisi, J. L., Iyer, V. R. and Brown, P. O. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale. Science 278 , 680-686.
[11] Escobar, M. D. (1994). Estimating normal means with a Dirichlet process prior. Journal of the American Statistical Association 89 , 268-277. · Zbl 0791.62039 · doi:10.2307/2291223
[12] Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association 90 , 577-588. · Zbl 0826.62021 · doi:10.2307/2291069
[13] Ferguson, S. T. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics 1 , 209-230. · Zbl 0255.62037 · doi:10.1214/aos/1176342360
[14] Fox, R. J. and Dimmic, M. W. (2006). A two-sample Bayesian t-test for microarray data. BMC Bioinformatics 7 , 126.
[15] Goeman, J. J. and Bühlmann, P. (2007). Analyzing gene expression data in terms of gene set: Methodological issues. Bioinformatics 23 , 980-987.
[16] Gopalan, R. and Berry, D. A. (1998). Bayesian multiple comparisons using Dirichlet process priors. Journal of the American Statistical Association 93 , 1130-1139. · Zbl 1063.62530 · doi:10.2307/2669856
[17] Hatifield, G. W., Hung, S. and Baldi, P. (2003). Differential analysis of DNA microarray gene expression data. Molecular Microbiology 47 , 871-877.
[18] Huber, W., von Heydebreck, A., Sültmann, H., Poustka, A. and Vingron, M. (2002). Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Bioinformatics 18 , S96-S104.
[19] Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B. and Speed, T. P. (2003). Summaries of Affymetrix GeneChip probel level data. Nucleic Acids Research 31 , e15.
[20] Jain, S. and Neal, R. M. (2004). A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture models. Journal of Computational and Graphical Statistics 13 , 158-182.
[21] Jain, S. and Neal, R. (2007). Splitting and merging components of a nonconjugated Dirichlet process mixture model. Bayesian Analysis 2 , 445-472. · Zbl 1331.62145
[22] Kass, R. and Raftery, A. (1995). Bayes factor. Journal of the American Statistical Association 90 , 773-795. · Zbl 0846.62028
[23] Lonnstedt, I. and Speed, T. P. (2001). Replicated microarray data. Statistical Sinica 12 , 31-46. · Zbl 1004.62086
[24] Louzada, F., Saraiva, E. F., Milan, L. A. and Cobre, J. (2014). A predictive Bayes factor approach to identify genes differentially expressed: An application to Escherichia coli bacterium data. Brazilian Journal of Probability an Statistics 28 , 167-189. · Zbl 1319.62215
[25] Medvedovic, M. and Sivaganesan, S. (2002). Bayesian infinite mixture model based clustering of gene expression profiles. Bioinformatics 18 , 1194-1206.
[26] Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 9 , 249-265.
[27] Parkitna, J. R., Korostynski, M., Kaminska-Chowaniec, D., Obara, I., Mika, J., Przewlocka, B. and Przewlocki, R. (2006). Comparison of gene expression profiles in neuropathic and inflammatory pain. Journal of Physiology and Pharmacology 57 , 401-414.
[28] Pavlids, P. (2003). Using ANOVA for gene selection from microarray studies of the nervous system. Methods 31 , 282-289.
[29] Schena, M., Shalon, D., Davis, R. W. and Brown, P. O. (1995). Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270 , 467-470.
[30] Shapiro, C. P. (1977). Classification by maximum posterior probability. The Annals of Statistics 5 , 185-190. · Zbl 0364.62062 · doi:10.1214/aos/1176343752
[31] Smyth, G. K. and Speed, T. P. (2003). Normalization of cDNA microarray data. Methods 31 , 265-273.
[32] Wu, T. D. (2001). Analyzing gene expression data from DNA microarray to identify candidates genes. Journal of Pathology 195 , 53-65.
[33] Yang, Y. H., Dudoit, S., Luu, P., Lin, D. M., Peng, V., Ngai, J. and Speed, T. P. (2002). Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research 30 , e15.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.