×

An evolutionary Monte Carlo algorithm for Bayesian block clustering of data matrices. (English) Zbl 1471.62083

Summary: In many applications, it is of interest to simultaneously cluster row and column variables in a data set, identifying local subgroups within a data matrix that share some common characteristic. When a small set of variables is believed to be associated with a set of responses, block clustering or biclustering is a more appropriate technique to use compared to one-dimensional clustering. A flexible framework for Bayesian model-based block clustering, that can determine multiple block clusters in a data matrix through a novel and efficient evolutionary Monte Carlo-based methodology, is proposed. The performance of this methodology is illustrated through a number of simulation studies and an application to data from genome-wide association studies.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)
62P10 Applications of statistics to biology and medical sciences; meta analysis
65C40 Numerical analysis or methods applied to Markov chains

Software:

PLINK
Full Text: DOI

References:

[1] Bottolo, L.; Richardson, S., Evolutionary stochastic search for Bayesian model exploration, Bayesian Anal., 5, 3, 583-618, (2010) · Zbl 1330.90042
[2] Bouveyron, C.; Brunet-Saumard, C., Model-based clustering of high-dimensional data: a review, Comput. Statist. Data Anal., (2013) · Zbl 1471.62032
[3] Cheng, Y.; Church, G., Biclustering of expression data, Proc. Int. Conf. Intell. Syst. Mol. Biol., 8, 93-103, (2000)
[4] Conover, C. A., Insulin-like growth factor-binding proteins and bone metabolism, Am. J. Physiol. Endocrinol. Metab., 294, 1, E10-E14, (2008)
[5] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Statist. Soc. Ser. B, 39, 1, 1-38, (1977) · Zbl 0364.62022
[6] Deng, H. W.; Mahaney, M. C.; Williams, J. T.; Li, J.; Conway, T.; Davies, K. M.; Li, J. L.; Deng, H.; Recker, R. R., Relevance of the genes for bone mass variation to susceptibility to osteoporotic fractures and its implications to gene search for complex human diseases, Genet. Epidemiol., 22, 12-25, (2002)
[7] Diebolt, J.; Robert, C. P., Estimation of finite mixture distributions through Bayesian sampling, J. Roy. Statist. Soc. Ser. B, 56, 2, 363-375, (1994) · Zbl 0796.62028
[8] Fraley, C.; Raftery, A. E., Model-based clustering, discriminant analysis, and density estimation, J. Amer. Statist. Assoc., 97, 458, 611-631, (2002) · Zbl 1073.62545
[9] Fraley, C.; Raftery, A.; Wehrens, R., Incremental model-based clustering for large datasets with small clusters, J. Comput. Graph. Statist., 14, 3, 529-546, (2005)
[10] Gelman, A., Prior distributions for variance parameters in hierarchical models (comment on article by browne and draper), Bayesian Anal., 1, 3, 515-533, (2006), (electronic) · Zbl 1331.62139
[11] George, E.; McCulloch, R., Variable selection via Gibbs sampling, J. Amer. Statist. Assoc., 88, 881-889, (1993)
[12] Geyer, C., 1991. Markov Chain Monte Carlo maximum likelihood. In: Computing Science and Statistics: the 23rd Symposium on the Interface. pp. 156-163.
[13] Geyer, C. J.; Thompson, E. A., Annealing Markov chain Monte Carlo with applications to ancestral inference, J. Amer. Statist. Assoc., 90, 431, 909-920, (1995) · Zbl 0850.62834
[14] Ghahramani, Z.; Griffiths, T. L.; Sollich, P., Bayesian nonparametric latent feature models, Bayesian Statist., 8, 1-25, (2006)
[15] Goswami, G.; Liu, J. S.; Wong, W. H., Evolutionary Monte Carlo methods for clustering, J. Comput. Graph. Statist., 16, 855-876, (2007)
[16] Govaert, G.; Nadif, M., Block clustering with Bernoulli mixture models: comparison of different approaches, Comput. Statist. Data Anal., 52, 6, 3233-3245, (2008) · Zbl 1452.62444
[17] Green, P. J., Reversible jump MCMC and Bayesian model determination, Biometrika, 82, 711-732, (1995) · Zbl 0861.62023
[18] Gu, J.; Liu, J., Bayesian biclustering of gene expression data, BMC Genomics, 9, Suppl 1, S4, (2008)
[19] Gupta, M.; Cheung, C. L.; Hsu, Y. H.; Demissie, S.; Cupples, L. A.; Kiel, D. P.; Karasik, D., Identification of homogeneous genetic architecture of multiple genetically correlated traits by block clustering of genome-wide associations, J. Bone Miner. Res., 26, 6, 1261-1271, (2011)
[20] Hartigan, J. A., Direct clustering of a data matrix, J. Amer. Statist. Assoc., 67, 337, 123-129, (1972)
[21] Huang, D. A.W.; Sherman, B. T.; Lempicki, R. A., Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists, Nucleic Acids Res., 37, 1, 1-13, (2009)
[22] Hubert, L.; Arabie, P., Comparing partitions, J. Classification, 2, 1, 193-218, (1985)
[23] Karasik, D.; Shimabuku, N. A.; Zhou, Y.; Zhang, Y.; Cupples, L. A.; Kiel, D. P.; Demissie, S., A genome wide linkage scan of metacarpal size and geometry in the framingham study, Am. J. Hum. Biol., 20, 663-670, (2008)
[24] Kluger, Y.; Basri, R.; Chang, J.; Gerstein, M., Spectral biclustering of microarray data: coclustering genes and conditions, Genome Res., 13, 703-716, (2003)
[25] Kou, S. C.; Zhou, Q.; Wong, W. H., The equi-energy sampler with applications in statistical inference and statistical mechanics, Ann. Statist., 34, 4, 1581-1619, (2006), (with discussion) · Zbl 1246.82054
[26] Lange, K., Mathematical and statistical methods for genetic analysis, (2002), Springer Press · Zbl 0991.92017
[27] Lazzeroni, L.; Owen, A., Plaid models for gene expression data, Statist. Sinica, 12, 1, 61-86, (2002) · Zbl 1004.62084
[28] Li, J., Clustering based on a multilayer mixture model, J. Comput. Graph. Statist., 14, 3, 547-568, (2005)
[29] Liang, F.; Wong, W. H., Evolutionary Monte Carlo: applications to \(c_p\) model sampling and change point problem, Statist. Sinica, 10, 317-342, (2000) · Zbl 1054.65500
[30] MacEachern, S. N.; Müller, P., Estimating mixture of Dirichlet process models, J. Comput. Graph. Statist., 7, 2, 223-238, (1998)
[31] McLachlan, G.; Peel, D., Finite mixture models, vol. 299, (2000), Wiley-Interscience · Zbl 0963.62061
[32] Oti, M.; Huynen, M. A.; Brunner, H. G., Phenome connections, Trends Genet., 24, 103-106, (2008)
[33] Purcell, S.; Neale, B.; Todd-Brown, K.; Thomas, L.; Ferreira, M. A.; Bender, D.; Maller, J.; Sklar, P.; de Bakker, P. I.; Daly, M. J.; Sham, P. C., PLINK: a tool set for whole-genome association and population-based linkage analyses, Am. J. Hum. Genet., 81, 559-575, (2007)
[34] Rand, W. M., Objective criteria for the evaluation of clustering methods, J. Amer. Stat. Assoc., 66, 336, 846-850, (1971)
[35] Rivadeneira, F.; Zillikens, M. C.; De Laet, C. E.D. H.; Hofman, A.; Uitterlinden, A. G.; Beck, T. J.; Pols, H. A.P., Femoral neck BMD is a strong predictor of hip fracture susceptibility in elderly men and women because it detects cortical bone instability: the rotterdam study, J. Bone Miner. Res., 22, 11, 1781-1790, (2007)
[36] Segal, E., Battle, A., Koller, D., 2003. Decomposing gene expression into cellular processes. In: Pac. Symp. Biocomput. pp. 89-100. · Zbl 1219.92027
[37] Tanay, A.; Sharan, R.; Shamir, R., Discovering statistically significant biclusters in gene expression data, Bioinformatics, 18, Suppl 1, S136-S144, (2002)
[38] Tseng, G.; Wong, W., Tight clustering: a resampling-based approach for identifying stable and tight patterns in data, Biometrics, 61, 10-16, (2005) · Zbl 1077.62049
[39] Zheng, Q.; Wang, X. J., GOEAST: a web-based software toolkit for gene ontology enrichment analysis, Nucleic Acids Res., 36, Web Server Issue, W358-W363, (2008)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.