×

Estimating Bayesian networks for high-dimensional data with complex mean structure and random effects. (English) Zbl 1334.62011

Summary: The estimation of Bayesian networks given high-dimensional data, in particular gene expression data, has been the focus of much recent research. Whilst there are several methods available for the estimation of such networks, these typically assume that the data consist of independent and identically distributed samples. It is often the case, however, that the available data have a more complex mean structure, plus additional components of variance, which must then be accounted for in the estimation of a Bayesian network. In this paper, score metrics that take account of such complexities are proposed for use in conjunction with score-based methods for the estimation of Bayesian networks. We propose first, a fully Bayesian score metric, and second, a metric inspired by the notion of restricted maximum likelihood. We demonstrate the performance of these new metrics for the estimation of Bayesian networks using simulated data with known complex mean structures. We then present the analysis of expression levels of grape-berry genes adjusting for exogenous variables believed to affect the expression levels of the genes. Demonstrable biological effects can be inferred from the estimated conditional independence relationships and correlations amongst the grape-berry genes.

MSC:

62A09 Graphical methods in statistics
62F15 Bayesian inference
68T35 Theory of languages and software systems (knowledge-based systems, expert systems, etc.) for artificial intelligence
62H25 Factor analysis and principal components; correspondence analysis
62-07 Data analysis (statistics) (MSC2010)
05C90 Applications of graph theory
62P10 Applications of statistics to biology and medical sciences; meta analysis
92D10 Genetics and epigenetics

Software:

Graphviz; R

References:

[1] Chickering, Large-sample learning of Bayesian networks is NP-hard, Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence pp 124– (2003) · Zbl 1222.68169
[2] Coombe, The regulation of set and development of the grape berry, Acta Horticulturae 34 pp 261– (1973) · doi:10.17660/ActaHortic.1973.34.36
[3] Davison, Statistical Models (2003) · doi:10.1017/CBO9780511815850
[4] De Jong, Modeling and simulation of genetic regulatory systems: A literature review, J. Comput. Biol. 9 pp 67– (2002) · doi:10.1089/10665270252833208
[5] Dobra, Sparse graphical models for exploring gene expression data, J. Multivariate Anal. 90 pp 196– (2004) · Zbl 1047.62104 · doi:10.1016/j.jmva.2004.02.009
[6] Dykstra, Establishing the positive definiteness of the sample covariance matrix, Ann. Math. Stat. 41 pp 2153– (1970) · Zbl 0212.22202 · doi:10.1214/aoms/1177696719
[7] Ellson, Graphviz and dynagraph - static and dynamic graph drawing tools pp 127– (2004)
[8] Friedman, Inferring cellular networks using probabilistic graphical models, Science 303 pp 799– (2004) · doi:10.1126/science.1094068
[9] Friedman, Using Bayesian networks to analyze expression data, J. Comput. Biol. 7 pp 601– (2000) · doi:10.1089/106652700750050961
[10] Heckerman , D. Geiger , D. 1994 Learning Gaussian networks Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence
[11] Geiger, Parameter priors for directed acyclic graphical models and the characterization of several probability distributions, Ann. Statist. 30 pp 1412– (2002) · Zbl 1016.62064 · doi:10.1214/aos/1035844981
[12] Irizarry, Exploration, normalization, and summaries of high density oligonucleotide array probe level data, Biostatistics 4 pp 249– (2003) · Zbl 1141.62348 · doi:10.1093/biostatistics/4.2.249
[13] Kanehisa, KEGG for representation and analysis of molecular networks involving diseases and drugs, Nucleic acids research 38 pp D355– (2010) · doi:10.1093/nar/gkp896
[14] Kasza , J. Solomon , P. 2011 A comparison of score-based methods for estimating Bayesian networks using the Kullback-Leibler divergence arXiv:1009.1463v2
[15] Kasza, Bayesian networks for high-dimensional data with complex mean structure (2009)
[16] Koller, Probabilistic graphical models (2009) · Zbl 1183.68483
[17] Kotak, Complexity of the heat stress response in plants, Current opinion in plant biology 10 pp 310– (2007) · doi:10.1016/j.pbi.2007.04.011
[18] Lauritzen, Graphical Models (2004) · Zbl 0907.62001
[19] Markowetz, Inferring cellular networks - a review, BMC Bioinformatics 8 pp S5– (2007) · doi:10.1186/1471-2105-8-S6-S5
[20] R: A Language and Environment for Statistical Computing (2007)
[21] Robinson, Molecular biology of grape berry ripening, Australian Journal of Grape and Wine Research 6 pp 175– (2000) · doi:10.1111/j.1755-0238.2000.tb00177.x
[22] Smyth, Linear models and empirical Bayes methods for assessing differential expression in microarray experiments, Stat. Appl. Genet. Mol. Biol. 3 (2004) · Zbl 1038.62110
[23] Speed, Encyclopedia of Statistical Sciences Update Volume 1 (1997)
[24] Spirtes, Causation, Prediction, and Search (1993) · doi:10.1007/978-1-4612-2748-9
[25] Wang, Role of plant heat-shock proteins and molecular chaperones in the abiotic stress response, Trends in Plant Science 9 pp 244– (2004) · doi:10.1016/j.tplants.2004.03.006
[26] Wermuth, Linear recursive equations, covariance selection, and path analysis, J. Amer. Statist. Assoc. 75 pp 963– (1980) · Zbl 0475.62056 · doi:10.1080/01621459.1980.10477580
[27] Yabe, Analysis of tissue-specific expression of Arabidopsis thaliana HSP90-family gene HSP81, Plant cell physiology 35 pp 1207– (1994) · doi:10.1093/oxfordjournals.pcp.a078715
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.