×

Inferring gene regulatory networks by an order independent algorithm using incomplete data sets. (English) Zbl 1514.62384

Summary: Analyzing incomplete data for inferring the structure of gene regulatory networks (GRNs) is a challenging task in bioinformatic. Bayesian network can be successfully used in this field. \(k\)-nearest neighbor, singular value decomposition (SVD)-based and multiple imputation by chained equations are three fundamental imputation methods to deal with missing values. Path consistency (PC) algorithm based on conditional mutual information (PCA-CMI) is a famous algorithm for inferring GRNs. This algorithm needs the data set to be complete. However, the problem is that PCA-CMI is not a stable algorithm and when applied on permuted gene orders, different networks are obtained. We propose an order independent algorithm, PCA-CMI-OI, for inferring GRNs. After imputation of missing data, the performances of PCA-CMI and PCA-CMI-OI are compared. Results show that networks constructed from data imputed by the SVD-based method and PCA-CMI-OI algorithm outperform other imputation methods and PCA-CMI. An undirected or partially directed network is resulted by PC-based algorithms. Mutual information test (MIT) score, which can deal with discrete data, is one of the famous methods for directing the edges of resulted networks. We also propose a new score, ConMIT, which is appropriate for analyzing continuous data. Results shows that the precision of directing the edges of skeleton is improved by applying the ConMIT score.

MSC:

62-XX Statistics
Full Text: DOI

References:

[1] S. Acid and L.M. de Campos, A hybrid methodology for learning belief networks: Benedict, Internat. J. Approx. Reason. 27(3) (2001), pp. 235-262. · Zbl 0983.68197 · doi:10.1016/S0888-613X(01)00041-X
[2] E. Acuna and C. Rodriguez, The treatment of missing values and its effect on classifier accuracy, in Classification, Clustering, and Data Mining Applications, D. Banks, L. House, F.R. McMorris, P. Arabie, and W. Gaul, eds., Springer-Verlag, Berlin-Heidelberg, 2004, pp. 639-647. · doi:10.1007/978-3-642-17103-1_60
[3] R. Aghdam, M. Ganjali, C. Eslahchi, and L. Kaderali, IPCA-CMI: An algorithm for inferring gene regulatory networks based on a combination of PCA-CMI and MIT score, PloS One 9(4) (2014), pp. 1-10. · doi:10.1371/journal.pone.0092600
[4] R. Aghdam, M. Ganjali, X. Zhang, and C. Eslahchi, CN: A consensus algorithm for inferring gene regulatory networks using the SORDER algorithm and conditional mutual information test, Mol. BioSyst. 11(3) (2015), pp. 942-949. · doi:10.1039/C4MB00413B
[5] N.A. Ahmed and D.V. Gokhale, Entropy expressions and their estimators for multivariate distributions, IEEE Trans. Inform. Theory 35(3) (1989), pp. 688-692. · Zbl 0672.62005 · doi:10.1109/18.30996
[6] T. Aittokallio, Dealing with missing values in large-scale studies: Microarray data imputation and beyond, Briefings in Bioinform. 11 (2010), pp. 253-264. · doi:10.1093/bib/bbp059
[7] H. Akaike, A new look at the statistical model identification, IEEE Trans. Automat. Control 19(6) (1974), pp. 716-723. · Zbl 0314.62039 · doi:10.1109/TAC.1974.1100705
[8] S.A. Andersson, D. Madigan, and M.D. Perlman, A characterization of Markov equivalence classes for acyclic digraphs, Ann. Statist. 25(2) (1997), pp. 505-541. · Zbl 0876.60095 · doi:10.1214/aos/1031833662
[9] R.R. Bouckaert, Bayesian belief networks: From construction to inference, Ph.D. thesis, Universiteit Utrecht, Faculteit Wiskunde en Informatica, Netherlands, 1995.
[10] W. Buntine, Theory refinement on Bayesian networks, UAI91 Proceedings of the Seventh conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., 1991, pp. 52-60.
[11] L.F. Burgette and J.P. Reiter, Multiple imputation for missing data via sequential regression trees, Am. J. Epidemiol. 172 (2010), pp. 1070-1076. · doi:10.1093/aje/kwq260
[12] J. Catlett, On changing continuous attributes into ordered discrete attributes, in Machine Learning-EWSL-91, Proc. of the European Working Session on Learning, Y. Kodratoff, ed., Springer, Porto, Portugal, 1991, pp. 164-178. · doi:10.1007/BFb0017012
[13] D. Chickering, D. Geiger, and D. Heckerman, Learning Bayesian networks: Search methods and experimental results, Proceedings of Fifth Conference on Artificial Intelligence and Statistics, 1995, pp. 112-128.
[14] C. Chow and C. Liu, Approximating discrete probability distributions with dependence trees, IEEE Trans. Inform. Theory 14(3) (1968), pp. 462-467. · Zbl 0165.22305 · doi:10.1109/TIT.1968.1054142
[15] T. Claassen, J. Mooij, and T. Heskes, Learning sparse causal models is not np-hard, preprint (2013). Available at arXiv:1309.6824.
[16] D. Colombo and M.H. Maathuis, Order-independent constraint-based causal structure learning, J. Mach. Learn. Res. 15 (2014), pp. 3741-3782.
[17] D. Colombo, M.H. Maathuis, M. Kalisch, and T.S. Richardson, Learning high-dimensional directed acyclic graphs with latent and selection variables, Ann. Statist. 40(1) (2012), pp. 294-321. · Zbl 1246.62131 · doi:10.1214/11-AOS940
[18] G.F. Cooper and E. Herskovits, A Bayesian method for the induction of probabilistic networks from data, Mach. Learn. 9(4) (1992), pp. 309-347. · Zbl 0766.68109 · doi:10.1007/BF00994110
[19] T.M. Cover and J.A. Thomas, Elements of Information Theory, John Wiley & Sons, Hoboken, NJ, 2012.
[20] R.G. Cowell, Parameter estimation from incomplete data for Bayesian networks, in Artificial Intelligence and Statistics 99: Proceedings of the 7th International Workshop on Artificial Intelligence and Statistics, D. Heckerman, and J. Whittaker, eds., San Francisco, Morgan Kaufmann, 1999, pp. 193-196.
[21] L.M. De Campos, A scoring function for learning Bayesian networks based on mutual information and conditional independence tests, J. Mach. Learn. Res. 7 (2006), pp. 2149-2187. · Zbl 1222.62036
[22] M. Di Zio, G. Sacco, M. Scanu, and P. Vicard, Multivariate techniques for imputation based on Bayesian networks, Neural Netw. World 15(4) (2005), pp. 303-310.
[23] M. Di Zio, M. Scanu, L. Coppola, O. Luzi, and A. Ponti, Bayesian networks for imputation, J. R. Stat. Soc. Ser. A 167(2) (2004), pp. 309-322. · Zbl 1408.62043 · doi:10.1046/j.1467-985X.2003.00736.x
[24] N. Dojer, A. Gambin, A. Mizera, B. Wilczyński, and J. Tiuryn, Applying dynamic Bayesian networks to perturbed gene expression data, BMC Bioinformatics 7(1) (2006), pp. 1-11. doi:10.1186/1471-2105-7-249 · doi:10.1186/1471-2105-7-249
[25] J. Dougherty, R. Kohavi, M. Sahami, Supervised and unsupervised discretization of continuous features, Machine Learning: Proceedings Of the Twelfth International Conference, Vol. 12, 1995, pp. 194-202.
[26] R. Drake and A. Guha, A mutual information-based \(k\)-sample test for discrete distributions, J. Appl. Statist. 41(9) (2014), pp. 2011-2027. · Zbl 1352.62063
[27] E. Faulkner, K2ga: Heuristically guided evolution of Bayesian network structures from data, IEEE Symposium on Computational Intelligence and Data Mining, 2007, CIDM 2007, pp. 18-25.
[28] C. Forbes, M. Evans, N. Hastings, and B. Peacock, Statistical Distributions, John Wiley & Sons, Canada, 2011. · Zbl 1258.62012
[29] N. Friedman and M. Goldszmidt, Learning Bayesian networks with local structure, in Learning in Graphical Models, M.I. Jordan, ed., Springer, Netherlands, 1998, pp. 421-459. · Zbl 0910.68176 · doi:10.1007/978-94-011-5014-9_15
[30] N. Friedman, M. Linial, I. Nachman, and D. Pe’er, Using Bayesian networks to analyze expression data, J. Comput. Biol. 7(3-4) (2000), pp. 601-620. · doi:10.1089/106652700750050961
[31] J.A. Gámez, J.L. Mateo, and J.M. Puerta, A fast hill-climbing algorithm for Bayesian networks structure learning, in Symbolic and Quantitative Approaches to Reasoning with Uncertainty, 9th European Conference, K. Mellouli, ed., Springer, Hammamet, Tunisia, 2007, pp. 585-597. · Zbl 1148.68514 · doi:10.1007/978-3-540-75256-1_52
[32] D. Heckerman, D. Geiger, and D.M. Chickering, Learning Bayesian networks: The combination of knowledge and statistical data, Mach. Learn. 20(3) (1995), pp. 197-243. · Zbl 0831.68096 · doi:10.1007/BF00994016
[33] E.R. Hruschka Jr., E.R. Hruschka, and N.F. Ebecken, Bayesian networks for imputation in classification problems, J. Intell. Inf. Syst. 29(3) (2007), pp. 231-252. · doi:10.1007/s10844-006-0016-x
[34] S. Imoto, T. Goto, S. Miyano, Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression, Pacific Symposium on Biocomputing, Vol. 7, World Scientific, 2002, pp. 175-186.
[35] M. Kalisch, M. Mächler, D. Colombo, M.H. Maathuis, and P. Bühlmann, Causal inference using graphical models with the r package pcalg, J. Statist. Softw. 47(11) (2012), pp. 1-26. · doi:10.18637/jss.v047.i11
[36] M. Kayaalp and G.F. Cooper, A Bayesian network scoring metric that is based on globally uniform parameter priors, Proceedings of the Eighteenth Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., 2002, pp. 251-258.
[37] R. Kerber, Chimerge: Discretization of numeric attributes, Proceedings of the Tenth National Conference on Artificial Intelligence, Aaai Press, 1992, pp. 123-128.
[38] S. Kullback, Information Theory and Statistics, Courier Corporation, New York, 1997. · Zbl 0897.62003
[39] W. Lam and F. Bacchus, Learning Bayesian belief networks: An approach based on the MDL principle, Comput. Intell. 10(3) (1994), pp. 269-293. · doi:10.1111/j.1467-8640.1994.tb00166.x
[40] R.J. Little and D.B. Rubin, Statistical Analysis with Missing Data, John Wiley & Sons, Hoboken, NJ, 2002. · Zbl 1011.62004 · doi:10.1002/9781119013563
[41] M.H. Maathuis, M. Kalisch, P. Bühlmann, et al. Estimating high-dimensional intervention effects from observational data, Ann. Statist. 37(6A) (2009), pp. 3133-3164. · Zbl 1191.62118 · doi:10.1214/09-AOS685
[42] D.J. MacKay, Information Theory, Inference and Learning Algorithms, Cambridge University Press, Cambridge, 2003. · Zbl 1055.94001
[43] L.T. MacNeil and A.J.M. Walhout, Gene regulatory networks and the role of robustness and stochasticity in the control of gene expression, Genome Res. 21(5) (2011), pp. 645-657. · doi:10.1101/gr.097378.109
[44] A. Madar, A. Greenfield, E. Vanden-Eijnden, R. Bonneau, and M. Isalan, Dream3: Network inference using dynamic context likelihood of relatedness and the Inferelator, PloS One 5(3) (2010), pp. 1-13. doi:10.1371/journal.pone.0009803 · doi:10.1371/journal.pone.0009803
[45] D. Madigan, S.A. Andersson, M.D. Perlman, and C.T. Volinsky, Bayesian model averaging and model selection for Markov equivalence classes of acyclic digraphs, Comm. Statist. Theory Methods 25(11) (1996), pp. 2493-2519. · Zbl 0894.62032
[46] D. Marbach, R.J. Prill, T. Schaffter, C. Mattiussi, D. Floreano, and G. Stolovitzky, Revealing strengths and weaknesses of methods for gene network inference, Proc. Natl. Acad. Sci. 107(14) (2010), pp. 6286-6291. · doi:10.1073/pnas.0913357107
[47] D. Marbach, T. Schaffter, C. Mattiussi, and D. Floreano, Generating realistic in silico gene networks for performance assessment of reverse engineering methods, J. Comput. Biol. 16(2) (2009), pp. 229-239. · doi:10.1089/cmb.2008.09TT
[48] A.A. Margolin, K. Wang, W.K. Lim, M. Kustagi, I. Nemenman, and A. Califano, Reverse engineering cellular networks, Nat. Protoc. 1(2) (2006), pp. 662-671. · doi:10.1038/nprot.2006.106
[49] K. Moorthy, M. Saberi Mohamad, and S. Deris, A review on missing value imputation algorithms for microarray gene expression data, Curr. Bioinf. 9(1) (2014), pp. 18-22. · doi:10.2174/1574893608999140109120957
[50] T.D. Nielsen and F.V. Jensen, Bayesian Networks and Decision Graphs, Springer Science & Business Media, New York, 2009.
[51] P. Niloofar and M. Ganjali, A new multivariate imputation method based on Bayesian networks, J. Appl. Statist. 41(3) (2014), pp. 501-518. · Zbl 1514.62774
[52] S. Oba, M.-A. Sato, I. Takemasa, M. Monden, K.-I. Matsubara, and S. Ishii, A Bayesian missing value estimation method for gene expression profile data, Bioinformatics 19(16) (2003), pp. 2088-2096. · doi:10.1093/bioinformatics/btg287
[53] J. Pearl, Causality: Models, Reasoning and Inference, Vol. 29, Cambridge University Press, Cambridge, 2000. · Zbl 0959.68116
[54] J. Pearl, Causal inference in statistics: An overview, Statist. Surv. 3 (2009), pp. 96-146. · Zbl 1300.62013 · doi:10.1214/09-SS057
[55] D. Pe’er, E. Regev, G. Elidan, and N. Friedman, Inferring subnetworks from perturbed expression profiles, Bioinformatics 17(Suppl 1) (2001), pp. S215-S224. · doi:10.1093/bioinformatics/17.suppl_1.S215
[56] R.J. Prill, D. Marbach, J. Saez-Rodriguez, P.K. Sorger, L.G. Alexopoulos, X. Xue, N.D. Clarke, G. Altan-Bonnet, G. Stolovitzky, and M. Isalan, Towards a rigorous assessment of systems biology models: The dream3 challenges, PloS One 5(2) (2010), pp. 1-18. doi:10.1371/journal.pone.0009202 · doi:10.1371/journal.pone.0009202
[57] M. Ramoni and P. Sebastiani, Robust learning with missing data, Mach. Learn. 45(2) (2001), pp. 147-170. · Zbl 1007.68154 · doi:10.1023/A:1010968702992
[58] C. Riggelsen, Learning parameters of Bayesian networks from incomplete data via importance sampling, Internat. J. Approx. Reason. 42(1-2) (2006), pp. 69-83. · Zbl 1096.68706 · doi:10.1016/j.ijar.2005.10.005
[59] J. Ruan and M. Isalan, A top-performing algorithm for the DREAM3 gene expression prediction challenge, PloS One 5(2) (2010), pp. 1-18. doi:10.1371/journal.pone.0008944 · doi:10.1371/journal.pone.0008944
[60] B.D. Rubin, Inference and missing data, Biometrika 63 (1976), pp. 581-592. · Zbl 0344.62034 · doi:10.1093/biomet/63.3.581
[61] G. Schwarz, Estimating the dimension of a model, Ann. Statist. 6(2) (1978), pp. 461-464. · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[62] P. Spirtes, An anytime algorithm for causal inference, Proc. of the Eighth International Workshop on Artificial Intelligence and Statistics, Citeseer, 2001, pp. 213-221.
[63] P. Spirtes, C. Glymour, and R. Scheines, Causation, Prediction, and Search, Vol. 81, The MIT Press, USA, 2000. · Zbl 0806.62001
[64] P. Spirtes, C. Meek, and T. Richardson, Causal inference in the presence of latent variables and selection bias, Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers Inc., 1995, pp. 499-506.
[65] G. Stolovitzky, R.J. Prill, and A. Califano, Lessons from the dream2 challenges, Ann. N. Y. Acad. Sci. 1158(1) (2009), pp. 159-195. · doi:10.1111/j.1749-6632.2009.04497.x
[66] O.G. Troyanskaya, M. Cantor, G. Sherlock, P.O. Brown, T. Hastie, R. Tibshirani, D. Botstein, and R.B. Altman, Missing value estimation methods for DNA microarrays, Bioinformatics 17 (2001), pp. 520-525. · doi:10.1093/bioinformatics/17.6.520
[67] I. Tsamardinos, L.E. Brown, and C.F. Aliferis, The max-min hill-climbing Bayesian network structure learning algorithm, Mach. Learn. 65(1) (2006), pp. 31-78. · Zbl 1470.68192 · doi:10.1007/s10994-006-6889-7
[68] T.V.J. udea Pearl, Equivalence and synthesis of causal models, Proceedings of Sixth Conference on Uncertainty in Artijicial Intelligence, 1991, pp. 220-227.
[69] K. Wang, M. Saito, B.C. Bisikirska, M.J. Alvarez, W.K. Lim, P. Rajbhandari, Q. Shen, I. Nemenman, K. Basso, A.A. Margolin, U. Klein, R. Dalla-Favera, and A. Califano, Genome-wide identification of post-translational modulators of transcription factor activity in human b cells, Nat. Biotechnol. 27(9) (2009), pp. 829-837. · doi:10.1038/nbt.1563
[70] J. Zhang, On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias, Artif. Intell. 172(16) (2008), pp. 1873-1896. · Zbl 1184.68434 · doi:10.1016/j.artint.2008.08.001
[71] X. Zhang, K. Liu, Z.-P. Liu, B. Duval, J.-M. Richer, X.-M. Zhao, J.-K. Hao, and L. Chen, NARROMI: A noise and redundancy reduction technique improves accuracy of gene regulatory network inference, Bioinformatics 29(1) (2013), pp. 106-113. · doi:10.1093/bioinformatics/bts619
[72] X. Zhang, X.-M. Zhao, K. He, L. Lu, Y. Cao, J. Liu, J.-K. Hao, Z.-P. Liu, and L. Chen, Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information, Bioinformatics 28(1) (2012), pp. 98-104. · doi:10.1093/bioinformatics/btr626
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.