High-dimensional causal discovery based on heuristic causal partitioning

Yinghan Hong^1,2,
Junping Guo ORCID: orcid.org/0000-0001-6801-0318³,
Guizhen Mai¹,
Yingqing Lin⁴,
Hao Zhang^5,6,
Zhifeng Hao⁴ &
…
Gengzhong Zheng⁷

474 Accesses
Explore all metrics

Abstract

Causal discovery is one of the most important research directions in the field of machine learning, aiming to discover the underlying causal relationships in the observed data. In practice, the time complexity of causal discovery will grow exponentially with increasing variables. To alleviate this problem, many methods based on divide-and-conquer strategies have been proposed. Existing methods usually partition the variables heuristically using scattered variables to achieve the dividing process, which makes it difficult to minimize vertex cut-set C and then leads to diminished causal discovery performance. In this work, we design an elaborated causal partition strategy called Causal Partition Base Graph (CPBG) to solve this problem. CPBG uses a set of low-order conditional independence (CI) tests to construct a rough skeleton S corresponding to the observed data and takes a heuristic method to search S for the optimal vertex cut-set C. Then the observed data can be partitioned into multiple variable subsets. We therefore can run a causal discovery method on each part and finally obtain the complete causal structure by merging the partial results. The proposed method is evaluated by various real-world causal datasets. Experimental results show that the CPBG method outperforms its existing counterparts, which proves that the method can support more effective and efficient causal discovery. The source code of the proposed method and all experimental results are available at https://github.com/DreamEdm/Causal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An efficient algorithm for large-scale causal discovery

Article 03 August 2016

Causal discovery on high dimensional data

Article 25 November 2014

Utilizing Expert Knowledge and Contextual Information for Sample-Limited Causal Graph Construction

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

Network data provided by Bnlearn: https://www.bnlearn.com/bnrepository/

References

Budhathoki K, Vreeken J (2018) Origo: causal inference by compression. Knowl Inf Syst 56 (2):285–307
Article Google Scholar
Cai R, Zhang Z, Hao Z (2011) Bassum: a bayesian semi-supervised method for classification feature selection. Pattern Recogn 44(4):811–820
Article MATH Google Scholar
Cai R, Zhang Z, Hao Z (2013a) Causal gene identification using combinatorial v-structure search. Neural Netw 43:63–71
Cai R, Zhang Z, Hao Z (2013b) Sada: a general framework to support robust causation discovery. In: International conference on machine learning, PMLR, pp 208–216
Cai Z, Li R, Zhang Y (2022) A distribution free conditional independence test with applications to causal discovery. J Mach Learn Res 23(85):1–41
MathSciNet Google Scholar
Geng Z, Wang C, Zhao Q (2005) Decomposition of search for v-structures in dags. J Multivar Anal 96(2):282–294
Article MathSciNet MATH Google Scholar
Goudet O, Kalainathan D, Caillou P et al (2018) Learning functional causal models with generative neural networks. In: Explainable and interpretable models in computer vision and machine learning. Springer, pp 39–80
He Y, Cui P, Shen Z et al (2021) Daring: differentiable causal discovery with residual independence. In: Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining, pp 596–605
He Z, Lin Y, Wei R, et al. (2022) Repulsion and attraction in searching: a hybrid algorithm based on gravitational kernel and vital few for cancer driver gene prediction. Comput Biol Med 151:106,236
Article Google Scholar
Hong Y, Liu Z, Mai G (2017) An efficient algorithm for large-scale causal discovery. Soft Comput 21(24):7381–7391
Article Google Scholar
Hoyer PO, Janzing D, Mooij JM et al (2008) Nonlinear causal discovery with additive noise models. In: NIPS, Citeseer, pp 689–696
Janzing D, Steudel B, Shajarisales N et al (2015) Justifying information-geometric causal inference. In: Measures of complexity. Springer, pp 253–265
Jiang D, Lin Y, Zhu W et al (2022) A parallel based evolutionary algorithm with primary-auxiliary knowledge. Inf Sci 610: 1122–1142
Article Google Scholar
Liu H, Zhou S, Lam W et al (2017) A new hybrid method for learning bayesian networks: separation and reunion. Knowl-Based Syst 121:185–197
Article Google Scholar
Mai G, Hong Y, Chen P et al (2020) Distinguish markov equivalence classes from large-scale linear non-gaussian data. IEEE Access 8:10:924–10:932
Article Google Scholar
Marx A, Vreeken J (2019) Testing conditional independence on discrete data using stochastic complexity. In: The 22nd international conference on artificial intelligence and statistics, PMLR, pp 496–505
Mooij JM, Peters J, Janzing D, et al. (2016) Distinguishing cause from effect using observational data: methods and benchmarks. J Mach Learn Res 17(1):1103–1204
MathSciNet MATH Google Scholar
Ng I, Ghassami A, Zhang K (2020) On the role of sparsity and dag constraints for learning linear dags. Adv Neural Inf Process Syst 33:17:943–17:954
Google Scholar
Pearl J (1988) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann
Pearl J et al (2000) Models, Reasoning and Inference. Cambridge University Press, Cambridge, p 19
MATH Google Scholar
Peters J, Janzing D, Scholkopf B (2011) Causal inference on discrete data using additive noise models. IEEE Trans Pattern Anal Mach Intell 33(12):2436–2450
Article Google Scholar
Schölkopf B, Janzing D, Peters J et al (2012) On causal and anticausal learning. In: ICML
Sgouritsa E, Janzing D, Hennig P et al (2015) Inference of cause and effect with unsupervised inverse regression. In: Artificial intelligence and statistics, PMLR, pp 847–855
Shimizu S et al (2011) Inazumi Directlingam: a direct method for learning a linear non-gaussian structural equation model. J Mach Learn Res (JMLR) 12:1225–1248
MathSciNet MATH Google Scholar
Shimizu S, Hoyer PO, Hyvärinen A et al (2006) A linear non-gaussian acyclic model for causal discovery. J Mach Learn Res 7:10
MathSciNet MATH Google Scholar
Spirtes P, Glymour CN, Scheines R et al (2000) Causation, Prediction, and Search. MIT Press
Xie X, Geng Z (2008) A recursive method for structural learning of directed acyclic graphs. J Mach Learn Res 9:459–483
MathSciNet MATH Google Scholar
Xie X, Geng Z, Zhao Q (2006) Decomposition of structural learning about directed acyclic graphs. Artif Intell 170(4-5): 422–439
Article MathSciNet MATH Google Scholar
Xu C, Huang H, Yoo S (2019) Scalable causal graph learning through a deep neural network. In: Proceedings of the 28th ACM international conference on information and knowledge management, pp 1853–1862
Yan C, Zhou S (2020) Effective and scalable causal partitioning based on low-order conditional independent tests. Neurocomputing 389:146–154
Article Google Scholar
Yang S, Deng B, Wang J et al (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons. IEEE Trans Neural Netw Learn Syst 31 (1):148–162
Article Google Scholar
Yang S, Wang J, Deng B et al (2021) Neuromorphic context-dependent learning framework with fault-tolerant spike routing. IEEE Transactions on Neural Networks and Learning Systems
Yang S, Linares-Barranco B, Chen B (2022a) Heterogeneous ensemble-based spike-driven few-shot online learning. Front Neurosci :16
Yang S, Tan J, Chen B (2022b) Robust spike-based continual meta-learning improved by restricted minimum error entropy criterion. Entropy 24(4):455
Yu Y, Chen J, Gao T et al (2019) Dag-gnn: dag structure learning with graph neural networks. In: International conference on machine learning, PMLR, pp 7154–7163
Zhang H, Zhou S, Zhang K et al (2017) Causal discovery using regression-based conditional independence tests. In: AAAI Conference on artificial intelligence
Zhang H, Zhou S, Guan J (2018) Measuring conditional independence by independent residuals:theoretical results and application in causal discovery. In: AAAI Conference on artificial intelligence
Zhang H, Zhou S, Guan J et al (2019) Measuring conditional independence by independent residuals for causal discovery. ACM Trans Intell Syst Technol (TIST) 10(5):1–19
Google Scholar
Zhang H, Zhou S, Yan C et al (2020) Learning causal structures based on divide and conquer. IEEE Transactions on Cybernetics
Zhang H, Zhou S, Zhang K et al (2022) Residual similarity based conditional independence test and its application in causal discovery. In: Proceedings of the AAAI conference on artificial intelligence, pp 5942–5949
Zheng X, Aragam B, Ravikumar PK et al (2018) Dags with no tears: continuous optimization for structure learning. Adv Neural Inf Process Syst :31

Download references

Acknowledgements

This work is supported by National Natural Science Foundation (No. 61971052, 62006051), National Key R & D Program of China (No. 2020YFC2004300), Science and Technology Planning Project of Guangdong Province, China (No. 2019B101001021, 2020B1010010010, Z20077), the Project of Young Innovative Talents in Colleges and Universities in Guangdong Province (No. 2020KQNCX049), Guangdong Provincial Key Laboratory (No.2020B121201001), the Research Project of Guangdong Provincial Department of Education(No. 2021KTSCX070, 2021KCXTD038), Guangdong Basic and Applied Basic Research Foundation (No. 2021A1515011995, 2022A1515011551), Doctor Starting Fund of Hanshan Normal University, China (No. QD20190628, QD2021201), Scientific Research Talents Fund of Hanshan Normal University, China (No. Z19113), School of Intelligent Manufacturing Industry of Hanshan Normal University (E22022), Scientific research project of Guangdong Provincial Department of Education (2022ZDZX4031), Research platform project of Hanshan Normal University (PNB221102), Guangdong Provincial Key Laboratory of Data Science and Intelligent Education (2022KSYS003).

Author information

Authors and Affiliations

School of Physics and Electronic Engineering, Hanshan Normal University, Chaozhou, 521041, Guangdong, China
Yinghan Hong & Guizhen Mai
Guangdong Provincial Key Laboratory of Brain-inspired Intelligent Computation, Southern University of Science and Technology, Shenzhen, 518055, China
Yinghan Hong
School of Mathematics and Statistics, Hanshan Normal University, Chaozhou, 521041, Guangdong, China
Junping Guo
Department of Mathematics, Shantou University, Shantou, 515063, Guangdong, China
Yingqing Lin & Zhifeng Hao
School of Computer Science, Fudan University, Shanghai, 200433, Shanghai, China
Hao Zhang
School of Computer Science, Guangdong University of Petrochemical Technology, Maoming, 525000, Guangdong, China
Hao Zhang
School of Computer and Information Engineering, Hanshan Normal University, Chaozhou, 521041, Guangdong, China
Gengzhong Zheng

Authors

Yinghan Hong
View author publications
You can also search for this author in PubMed Google Scholar
Junping Guo
View author publications
You can also search for this author in PubMed Google Scholar
Guizhen Mai
View author publications
You can also search for this author in PubMed Google Scholar
Yingqing Lin
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhifeng Hao
View author publications
You can also search for this author in PubMed Google Scholar
Gengzhong Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Junping Guo or Hao Zhang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix:

1.1 A.1 the score and standard deviations of all metrics

In this appendix, we propose the standard deviations for each data. Notice that the data in parentheses is the standard deviation of the averages of the corresponding positions. All bolded data are the best results from the experiment.

To improve readability, we divided the data of the same sample into two tables, i.e., the data of Tables 8 and 9 are different methods perform in same sample size (100 samples). Tables 10 and 11 are the data get in 200 samples, and so on.

Table 8 Four metrics and standard deviation (100 sample)

Full size table

Table 9 Four metrics and standard deviation (100 sample)

Full size table

Table 10 Four metrics and standard deviation (200 sample)

Full size table

Table 11 Four metrics and standard deviation (200 sample)

Full size table

Table 12 Four metrics and standard deviation (300 sample)

Full size table

Table 13 Four metrics and standard deviation (300 sample)

Full size table

Table 14 Four metrics and standard deviation (400 sample)

Full size table

Table 15 Four metrics and standard deviation (400 sample)

Full size table

Table 16 Four metrics and standard deviation (500 sample)

Full size table

Table 17 Four metrics and standard deviation (500 sample)

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Hong, Y., Guo, J., Mai, G. et al. High-dimensional causal discovery based on heuristic causal partitioning. Appl Intell 53, 23768–23796 (2023). https://doi.org/10.1007/s10489-023-04530-7

Download citation

Accepted: 13 February 2023
Published: 14 July 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04530-7

High-dimensional causal discovery based on heuristic causal partitioning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An efficient algorithm for large-scale causal discovery

Causal discovery on high dimensional data

Utilizing Expert Knowledge and Contextual Information for Sample-Limited Causal Graph Construction

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Appendix:

1.1 A.1 the score and standard deviations of all metrics

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

High-dimensional causal discovery based on heuristic causal partitioning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An efficient algorithm for large-scale causal discovery

Causal discovery on high dimensional data

Utilizing Expert Knowledge and Contextual Information for Sample-Limited Causal Graph Construction

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Appendix:

Appendix:

1.1 A.1 the score and standard deviations of all metrics

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation