×

Partitioned hybrid learning of Bayesian network structures. (English) Zbl 07570141

Summary: We develop a novel hybrid method for Bayesian network structure learning called partitioned hybrid greedy search (pHGS), composed of three distinct yet compatible new algorithms: Partitioned PC (pPC) accelerates skeleton learning via a divide-and-conquer strategy, \(p\)-value adjacency thresholding (PATH) effectively accomplishes parameter tuning with a single execution, and hybrid greedy initialization (HGI) maximally utilizes constraint-based information to obtain a high-scoring and well-performing initial graph for greedy search. We establish structure learning consistency of our algorithms in the large-sample limit, and empirically validate our methods individually and collectively through extensive numerical comparisons. The combined merits of pPC and PATH achieve significant computational reductions compared to the PC algorithm without sacrificing the accuracy of estimated structures, and our generally applicable HGI strategy reliably improves the estimation structural accuracy of popular hybrid algorithms with negligible additional computational expense. Our empirical results demonstrate the competitive empirical performance of pHGS against many state-of-the-art structure learning algorithms.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

References:

[1] Akaike, H., A new look at the statistical model identification, IEEE Transactions on Automatic Control, 19, 6, 716-723 (1974) · Zbl 0314.62039 · doi:10.1109/tac.1974.1100705
[2] Aliferis, C. F., Statnikov, A., Tsamardinos, I., Mani, S., & Koutsoukos, X. D. (2010). Local causal and Markov blanket induction for causal discovery and feature selection for classification Part I: Algorithms and empirical evaluation. Journal of Machine Learning Research, 11(7), 171-234. http://jmlr.org/papers/v11/aliferis10a.html. · Zbl 1242.68197
[3] Aliferis, C. F., Tsamardinos, I., & Statnikov, A. (2003). HITON: A novel Markov Blanket algorithm for optimal variable selection. In AMIA annual Symposium proceedings (Vol. 2003, p. 21-25). https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1480117/.
[4] Aragam, B.; Gu, J.; Zhou, Q., Learning large-scale Bayesian networks with the sparsebn package, Journal of Statistical Software, 91, 11, 1-38 (2019) · doi:10.18637/jss.v091.i11
[5] Buntine, W. (1991). Theory refinement on Bayesian networks. In B. D. D’Ambrosio, P. Smets & P. P. Bonissone (Eds.), Uncertainty proceedings 1991 (p. 52-60). Morgan Kaufmann. doi:10.1016/B978-1-55860-203-8.50010-3.
[6] Chickering, D. M. (2002a). Learning equivalence classes of Bayesian-network structures. Journal of Machine Learning Research, 2(Feb), 445-498. https://www.jmlr.org/papers/v2/chickering02a.html. · Zbl 1007.68179
[7] Chickering, D. M. (2002b). Optimal structure identification with greedy search. Journal of Machine Learning Research, 3(Nov), 507-554. https://jmlr.org/papers/v3/chickering02b.html. · Zbl 1084.68519
[8] Chickering, DM; Heckerman, D.; Meek, C., Large-sample learning of Bayesian networks is NP-hard, Journal of Machine Learning Research, 5, Oct, 1287-1330 (2004) · Zbl 1222.68169
[9] Colombo, D., & Maathuis, M. H. (2014). Order-independent constraint-based causal structure learning. Journal of Machine Learning Research, 15(116), 3921-3962. http://jmlr.org/papers/v15/colombo14a.html. · Zbl 1312.68165
[10] Cooper, G. F., & Herskovits, E. (1991). A Bayesian method for constructing Bayesian belief networks from databases. In B. D. D’Ambrosio, P. Smets & P. P. Bonissone (Eds.), Uncertainty proceedings 1991 (pp. 86-94). Morgan Kaufmann. doi:10.1016/B978-1-55860-203-8.50015-2.
[11] Cressie, N.; Read, TRC, Pearson’s \(\chi^2\) and the loglikelihood ratio statistic \(G^2\): A comparative review, International Statistical Review/Revue Internationale de Statistique, 57, 1, 19-43 (1989) · Zbl 0707.62105 · doi:10.2307/1403582
[12] Dor, D., & Tarsi, M. (1992). A simple algorithm to construct a consistent extension of a partially oriented graph. Technical Report R-185. Cognitive Systems Laboratory, UCLA.
[13] Friedman, N., Nachman, I., & Peér, D. (1999). Learning Bayesian network structure from massive datasets: The “Sparse Candidate” algorithm. In Proceedings of the fifteenth conference on uncertainty in artificial intelligence (pp. 206-215). Morgan Kaufmann Publishers Inc. doi:10.5555/2073796.2073820.
[14] Fu, F.; Zhou, Q., Learning sparse causal Gaussian networks with experimental intervention: Regularization and coordinate descent, Journal of the American Statistical Association, 108, 501, 288-300 (2013) · Zbl 06158343 · doi:10.1080/01621459.2012.754359
[15] Gámez, JA; Mateo, JL; Puerta, JM, Learning Bayesian networks by hill climbing: Efficient methods based on progressive restriction of the neighborhood, Data Mining and Knowledge Discovery, 22, 1-2, 106-148 (2011) · Zbl 1235.68152 · doi:10.1007/s10618-010-0178-6
[16] Gasse, M.; Aussem, A.; Elghazel, H., A hybrid algorithm for Bayesian network structure learning with application to multi-label learning, Expert Systems with Applications, 41, 15, 6755-6772 (2014) · doi:10.1016/j.eswa.2014.04.032
[17] Gu, J.; Fu, F.; Zhou, Q., Penalized estimation of directed acyclic graphs from discrete data, Statistics and Computing, 29, 1, 161-176 (2019) · Zbl 1430.62018 · doi:10.1007/s11222-018-9801-y
[18] Gu, J., & Zhou, Q. (2020). Learning big Gaussian Bayesian networks: Partition, estimation and fusion. Journal of Machine Learning Research, 21(158), 1-31. http://jmlr.org/papers/v21/19-318.html. · Zbl 1535.68260
[19] Hartigan, JA, Consistency of single linkage for high-density clusters, Journal of the American Statistical Association, 76, 374, 388-394 (1981) · Zbl 0468.62053 · doi:10.1080/01621459.1981.10477658
[20] Heckerman, D.; Geiger, D.; Chickering, DM, Learning Bayesian networks: The combination of knowledge and statistical data, Machine Learning, 20, 3, 197-243 (1995) · Zbl 0831.68096 · doi:10.1023/A:1022623210503
[21] Kalisch, M., & Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. Journal of Machine Learning Research, 8(22), 613-636. http://jmlr.org/papers/v8/kalisch07a.html. · Zbl 1222.68229
[22] Kalisch, M., Mächler, M., Colombo, D., Maathuis, M. H., & Bühlmann, P. (2012). Causal inference using graphical models with the R Package pcalg. Journal of Statistical Software, 47(11), 1-26. doi:10.18637/jss.v047.i11.
[23] Koller, D., & Friedman, N. (2009). Probabilistic graphical models: Principles and techniques. MIT Press. · Zbl 1183.68483
[24] Kraskov, A.; Stögbauer, H.; Andrzejak, RG; Grassberger, P., Hierarchical clustering using mutual information, Europhysics Letters, 70, 2, 278-284 (2005) · doi:10.1209/epl/i2004-10483-y
[25] Le, TD; Hoang, T.; Li, J.; Liu, L.; Liu, H.; Hu, S., A fast PC algorithm for high dimensional causal discovery with multi-core PCs, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 16, 1483-1495 (2016) · doi:10.1109/TCBB.2016.2591526
[26] Liu, Z.; Malone, B.; Yuan, C., Empirical evaluation of scoring functions for Bayesian network model selection, BMC Bioinformatics, 13, 1-16 (2012) · doi:10.1186/1471-2105-13-S15-S14
[27] Madsen, AL; Jensen, F.; Salmerón, A.; Langseth, H.; Nielsen, TD, A parallel algorithm for Bayesian network structure learning from large data sets, Knowledge-Based Systems, 117, 46-55 (2017) · doi:10.1016/j.knosys.2016.07.031
[28] Margaritis, D. (2003). Learning Bayesian network model structure from data. Unpublished Doctoral Dissertation, Carnegie Mellon University School of Computer Science.
[29] Meek, C. (1995). Causal inference and causal explanation with background knowledge. In Proceedings of the eleventh conference on uncertainty in artificial intelligence (pp. 403-410). Morgan Kaufmann Publishers, Inc. doi:10.5555/2074158.2074204.
[30] Meek, C. (1997). Graphical Models: Selecting causal and statistical models. Unpublished Doctoral Dissertation, Carnegie Mellon University School of Computer Science.
[31] Nandy, P.; Hauser, A.; Maathuis, MH, High-dimensional consistency in score-based and hybrid structure learning, The Annals of Statistics, 46, 6, 3151-3183 (2018) · Zbl 1411.62144 · doi:10.1214/17-AOS1654
[32] Neapolitan, R. E. (2004). Learning Bayesian networks (Vol. 38). Pearson Prentice Hall.
[33] R Core Team. (2021). R: A language and environment for statistical computing (computer software manual). https://www.Rproject.org/.
[34] Ramsey, J. D. (2015). Scaling up greedy equivalence search for continuous variables. Computing Research Repository. arXiv:1507.07749.
[35] Rissanen, J., Modeling by shortest data description, Automatica, 14, 5, 465-471 (1978) · Zbl 0418.93079 · doi:10.1016/0005-1098(78)90005-5
[36] Robinson, R. W. (1977). Counting unlabeled acyclic digraphs. In C. H. C. Little (Ed.), Combinatorial mathematics V (pp. 28-43). Springer. https://doi.org/10.1007/BFb0069178. · Zbl 0376.05031
[37] Russell, S., & Norvig, P. (2009). Artificial intelligence: A modern approach (3rd ed.). Pearson Prentice Hall. · Zbl 0835.68093
[38] Sachs, K.; Perez, O.; Pe’er, D.; Lauffenburger, DA; Nolan, GP, Causal protein-signaling networks derived from multiparameter single cell data, Science, 308, 5721, 523-529 (2005) · doi:10.1126/science.1105809
[39] Schwarz, G., Estimating the dimension of a model, The Annals of Statistics, 6, 2, 461-464 (1978) · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[40] Scutari, M., Learning Bayesian Networks with the bnlearn R Package, Journal of Statistical Software, 35, 3, 1-22 (2010) · doi:10.18637/jss.v035.i03
[41] Scutari, M., Bayesian network constraint-based structure learning algorithms: Parallel and optimized implementations in the bnlearn R package, Journal of Statistical Software, 77, 2, 1-20 (2017) · doi:10.18637/jss.v077.i02
[42] Shao, J. (2003). Mathematical statistics (2nd ed., pp. 91-160). Springer. · Zbl 1018.62001
[43] Spirtes, P. (2010). Introduction to causal inference. Journal of Machine Learning Research, 11(54), 1643-1662. http://jmlr.org/papers/v11/spirtes10a.html. · Zbl 1242.62009
[44] Spirtes, P.; Glymour, C., An algorithm for fast recovery of sparse causal graphs, Social Science Computer Review, 9, 1, 62-72 (1991) · doi:10.1177/089443939100900106
[45] Spirtes, P., Glymour, C., Scheines, R., & Heckerman, D. (2000). Causation, prediction, and search (2nd ed.). MIT Press. · Zbl 0806.62001
[46] Tsamardinos, I., Aliferis, C. F., & Statnikov, A. R. (2003a). Algorithms for large scale Markov blanket discovery. In FLAIRS conference (pp. 376-380).
[47] Tsamardinos, I., Aliferis, C. F., & Statnikov, A. (2003b). Time and sample efficient discovery of Markov blankets and direct causal relations. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 673-678). Association for Computing Machinery. doi:10.1145/956750.956838.
[48] Tsamardinos, I.; Brown, LE; Aliferis, CF, The max-min hill-climbing Bayesian network structure learning algorithm, Machine Learning, 65, 1, 31-78 (2006) · Zbl 1470.68192 · doi:10.1007/s10994-006-6889-7
[49] Tsamardinos, I., Statnikov, A. R., Brown, L. E., & Aliferis, C. F. (2006). Generating realistic large Bayesian networks by tiling. In FLAIRS conference (pp. 592-597).
[50] Verma, T., & Pearl, J. (1991). Equivalence and synthesis of causal models. UCLA Computer Science Department.
[51] Wongchokprasitti, C. (2019). R-causal: R Wrapper for Tetrad Library. v1.2.1. https://github.com/bd2kccd/r-causal.
[52] Yaramakala, S., & Margaritis, D. (2005). Speculative Markov blanket discovery for optimal feature selection. In Fifth IEEE international conference on data mining (ICDM’05). doi:10.1109/ICDM.2005.134.
[53] Zarebavani, B.; Jafarinejad, F.; Hashemi, M.; Salehkaleybar, S., cuPC: CUDA-based parallel PC algorithm for causal structure learning on GPU, IEEE Transactions on Parallel and Distributed Systems, 31, 3, 530-542 (2020) · doi:10.1109/TPDS.2019.2939126
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.