INK: knowledge graph embeddings for node classification

2153 Accesses
13 Citations
6 Altmetric
Explore all metrics

Abstract

Deep learning techniques are increasingly being applied to solve various machine learning tasks that use Knowledge Graphs as input data. However, these techniques typically learn a latent representation for the entities of interest internally, which is then used to make decisions. This latent representation is often not comprehensible to humans, which is why deep learning techniques are often considered to be black boxes. In this paper, we present INK: Instance Neighbouring by using Knowledge, a novel technique to learn binary feature-based representations, which are comprehensible to humans, for nodes of interest in a knowledge graph. We demonstrate the predictive power of the node representations obtained through INK by feeding them to classical machine learning techniques and comparing their predictive performances for the node classification task to the current state of the art: Graph Convolutional Networks (R-GCN) and RDF2Vec. We perform this comparison both on benchmark datasets and using a real-world use case.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic Knowledge Graph Embeddings via Local Embedding Reconstructions

Convolutional Complex Knowledge Graph Embeddings

The DLCC Node Classification Benchmark for Analyzing Knowledge Graph Embeddings

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

References

Anelli VW, Noia TD, Sciascio ED, Ragone A, Trotta J (2019) How to make latent factors interpretable by feeding factorization machines with knowledge graphs. arXiv:1909.05038
Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) Dbpedia: a nucleus for a web of open data. In: The semantic web, pp 722–735. Springer
Baldassarre F, Azizpour H (2019) Explainability techniques for graph convolutional networks. arXiv preprint arXiv:1905.13686
Bordes A, Usunier N, Garcia-Duran A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in Neural information processing systems, vol. 26. Curran Associates, Inc
De Boer V, Wielemaker J, Van Gent J, Hildebrand M, Isaac A, Van Ossenbruggen J, Schreiber G (2012) Supporting linked data production for cultural heritage institutes: the Amsterdam museum case study. In: Extended semantic web conference, pp 733–747. Springer
Ehrlinger L, Wöß W (2016) Towards a definition of knowledge graphs. SEMANTiCS (Posters, Demos, SuCCESS) 48:1–4
Google Scholar
Gulisano V, Jerzak Z, Katerinenko R, Strohbach M, Ziekow H (2017) The debs 2017 grand challenge. In: Proceedings of the 11th ACM international conference on distributed and event-based systems, DEBS ’17, p. 271-273. Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3093742.3096342
Gunel B (2019) Robust relational graph convolutional networks
Hamilton WL, Ying R, Leskovec J (2017) Representation learning on graphs: methods and applications. arXiv preprint arXiv:1709.05584
Kazemi SM, Poole D (2018) Simple embedding for link prediction in knowledge graphs. arXiv preprint arXiv:1802.04868
Khalid S, Khalil T, Nasreen S (2014) A survey of feature selection and feature extraction techniques in machine learning. In: 2014 Science and information conference, pp 372–378. IEEE
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907
Krech D (2006) Rdflib: a python library for working with rdf
Lecue F (2020) On the role of knowledge graphs in explainable ai. Semantic Web 11(1):41–51
Article Google Scholar
Lin Y, Liu Z, Luan H, Sun M, Rao S, Liu S (2015a) Modeling relation paths for representation learning of knowledge bases. arXiv preprint arXiv:1506.00379
Lin Y, Liu Z, Sun M, Liu Y, Zhu X (2015b) Learning entity and relation embeddings for knowledge graph completion. In: Proceedings of the AAAI conference on artificial intelligence, vol. 29
Lösch U, Bloehdorn S, Rettinger A (2012) Graph kernels for rdf data. In: Extended semantic web conference, pp 134–148. Springer
Lundberg SM, Lee SI (2017) A unified approach to interpreting model predictions. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems 30, pp 4765–4774. Curran Associates, Inc. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
Marzagao DK, Huynh TD, Helal A, Moreau L (2020) Provenance graph kernel. arXiv preprint arXiv:2010.10343
Mikolov T, Chen K, Corrado G, Dean J (2013a) Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013b) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Miller E (1998) An introduction to the resource description framework. Bull Am Soc Inf Sci Technol 25(1):15–19
Article Google Scholar
Neil D, Briody J, Lacoste A, Sim A, Creed P, Saffari A (2018) Interpretable graph convolutional neural networks for inference on noisy knowledge graphs. arXiv preprint arXiv:1812.00279
Nguyen DQ, Nguyen TD, Nguyen DQ, Phung D (2017) A novel embedding model for knowledge base completion based on convolutional neural network. arXiv preprint arXiv:1712.02121
Nickel M, Tresp V, Kriegel HP (2011) A three-way model for collective learning on multi-relational data. In: ICML
Nickel M, Murphy K, Tresp V, Gabrilovich E (2015) A review of relational machine learning for knowledge graphs. Proc IEEE 104(1):11–33
Article Google Scholar
Paulheim H (2012) Generating possible interpretations for statistics from linked open data. In: Extended semantic web conference, pp 560–574. Springer
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V et al (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
MathSciNet MATH Google Scholar
Portisch J, Hladik M, Paulheim H (2021) Finmatcher at finsim-2: hypernym detection in the financial services domain using knowledge graphs. arXiv preprint arXiv:2103.01576
Ristoski P, Paulheim H, Svátek V, Zeman V (2015) The linked data mining challenge 2015. In: KNOW@ LOD
Ristoski P, De Vries GKD, Paulheim H (2016a) A collection of benchmark datasets for systematic evaluations of machine learning on the semantic web. In: International semantic web conference, pp 186–194. Springer
Ristoski P, Paulheim H, Svátek V, Zeman V (2016b) The linked data mining challenge 2016. In: (KNOW@ LOD/CoDeS)@ ESWC
Ristoski P, Rosati J, Di Noia T, De Leone R, Paulheim H (2019) Rdf2vec: Rdf graph embeddings and their applications. Semantic Web 10(4):721–752
Article Google Scholar
Ristoski P, Gentile AL, Alba A, Gruhl D, Welch S (2020) Large-scale relation extraction from web documents and knowledge graphs with human-in-the-loop. J Web Semantics 60:100546
Article Google Scholar
Sabour S, Frosst N, Hinton GE (2017) Dynamic routing between capsules. arXiv preprint arXiv:1710.09829
Schlichtkrull M, Kipf TN, Bloem P, Van Den Berg R, Titov I, Welling M (2018) Modeling relational data with graph convolutional networks. In: European semantic web conference, pp 593–607. Springer
Tan Z, Zhao X, Fang Y, Xiao W (2018) Gtrans: generic knowledge graph embedding via multi-state entities and dynamic relation spaces. IEEE Access 6:8232–8244
Article Google Scholar
Taniar D, Rahayu JW (2006) Web semantics & ontology. Igi Global
Thanapalasingam T, van Berkel L, Bloem P, Groth P (2021) Relational graph convolutional networks: A closer look. arXiv preprint arXiv:2107.10015
Trouillon T, Welbl J, Riedel S, Gaussier E, Bouchard G (2016) Complex embeddings for simple link prediction. In: Proceedings of the 33rd international conference on international conference on machine learning - Volume 48, ICML’16, pp 2071–2080. JMLR.org
Union S (2018) Stardog
Vandewiele G, Steenwinckel B, Ongenae F, De Turck F (2019) Inducing a decision tree with discriminative paths to classify entities in a knowledge graph. In: SEPDA2019, the 4th International workshop on semantics-powered data mining and analytics, pp 1–6
Vandewiele G, Steenwinckel B, Agozzino T, Weyns M, Bonte P, Ongenae F, Turck FD (2020a) pyrdf2vec: Python implementation and extension of rdf2vec. IDLab. https://github.com/IBCNServices/pyRDF2Vec
Vandewiele G, Steenwinckel B, Bonte P, Weyns M, Paulheim H, Ristoski P, De Turck F, Ongenae F (2020b) Walk extraction strategies for node embeddings with rdf2vec in knowledge graphs. arXiv preprint arXiv:2009.04404
Voit MM, Paulheim H (2021) Bias in knowledge graphs—an empirical study with movie recommendation and different language editions of dbpedia. arXiv preprint arXiv:2105.00674
Vrandečić D, Krötzsch M (2014) Wikidata: a free collaborative knowledgebase. Commun ACM 57(10):78–85
Article Google Scholar
Vu T, Nguyen TD, Nguyen DQ, Phung D, et al. (2019) A capsule network-based embedding model for knowledge graph completion and search personalization. In: Proceedings of the 2019 conference of the North American Chapter of the Association for computational linguistics: human language technologies, Volume 1 (Long and Short Papers), pp 2180–2189
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Proceedings of the AAAI conference on artificial intelligence, vol. 28
Wang M, Zheng D, Ye Z, Gan Q, Li M, Song X, Zhou J, Ma C, Yu L, Gai Y, Xiao T, He T, Karypis G, Li J, Zhang Z (2019) Deep graph library: a graph-centric, highly-performant package for graph neural networks. arXiv preprint arXiv:1909.01315
Wilcke X, Bloem P, De Boer V (2017) The knowledge graph as the default data model for learning on heterogeneous knowledge. Data Sci 1(1–2):39–57. https://doi.org/10.3233/DS-170007
Article Google Scholar
Xiao H, Huang M, Hao Y, Zhu X (2015) Transg: a generative mixture model for knowledge graph embedding. arXiv preprint arXiv:1509.05488
Yanardag P, Vishwanathan S (2015) Deep graph kernels. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 1365–1374
Yang B, Yih WT, He X, Gao J, Deng L (2014) Embedding entities and relations for learning and inference in knowledge bases. arXiv preprint arXiv:1412.6575
Zhang Z, Cao L, Chen X, Tang W, Xu Z, Meng Y (2020) Representation learning of knowledge graphs with entity attributes. IEEE Access 8:7435–7441
Article Google Scholar
Zouaq A, Martel F (2020) What is the schema of your knowledge graph? leveraging knowledge graph embeddings and clustering for expressive taxonomy learning. In: Proceedings of the international workshop on semantic big data, pp 1–6

Download references

Acknowledgements

Bram Steenwinckel (1SA0219N), Gilles Vandewiele (1S31417N) and Michael Weyns (1SD8821N) are funded by a strategic base research Grant of the Fund for Scientific Research Flanders (FWO). This research is part of the imec.ICON project PROTEGO (HBC.2019.2812), co-funded by imec, VLAIO, Televic, Amaron, Z-Plus and ML2Grow.

Author information

Authors and Affiliations

IDLab, Technologiepark-Zwijnaarde 126, 9050, Ghent, Belgium
Bram Steenwinckel, Gilles Vandewiele, Michael Weyns, Terencio Agozzino, Filip De Turck & Femke Ongenae

Authors

Bram Steenwinckel
View author publications
You can also search for this author in PubMed Google Scholar
Gilles Vandewiele
View author publications
You can also search for this author in PubMed Google Scholar
Michael Weyns
View author publications
You can also search for this author in PubMed Google Scholar
Terencio Agozzino
View author publications
You can also search for this author in PubMed Google Scholar
Filip De Turck
View author publications
You can also search for this author in PubMed Google Scholar
Femke Ongenae
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bram Steenwinckel.

Ethics declarations

Reproducibility and code availability

The INK package and the code to perform this evaluation pipeline is provided on Github.^{Footnote 5}We also provide all the experimental results in the format of CSV files in this repository.

Additional information

Responsible editor: Annalisa Appice, Sergio Escalera, Jose A. Gamez, Heike Trautman.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Predictive performance results

This section provides detailed performance results for all the defined datasets and classifiers in Sect. 6. For each classifier, the best mean accuracy over 5 runs is visualised in italic and the overall best result is highlighted in bold. The standard deviations for each of these 5 runs are represented between brackets. Results that did not finish using our setup are denoted as ’/’ (Tables 8, 9, 10, 11, 12, 13 and 14).

Table 8 The accuracy scores and standard deviations for the described classifiers and embedding approached for the BGS dataset

Full size table

Table 9 The accuracy scores and standard deviations for the described classifiers and embedding approached for the AIFB dataset

Full size table

Table 10 The accuracy scores and standard deviations for the described classifiers and embedding approached for the MUTAG dataset

Full size table

Table 11 The accuracy scores and standard deviations for the described classifiers and embedding approached for the AM dataset

Full size table

Table 12 The accuracy scores and standard deviations for the described classifiers and embedding approached for the DBpedia Cities dataset

Full size table

Table 13 The accuracy scores and standard deviations for the described classifiers and embedding approached for the DBpedia Albums dataset

Full size table

Table 14 The accuracy scores and standard deviations for the described classifiers and embedding approached for the DBpedia Movies dataset

Full size table

Time measurements of best results

This section provides detailed information about the used time to generate the best results for all the defined datasets defined in Sect. 6. For each obtained result, the best mean time to 1) create the embedding, 2) train the classifier and 3) create the prediction over 5 runs is visualised in function of the depth. No graphs were added for techniques which do did deliver useful results (Fig. 4).

Memory consumption of best results

This section provides detailed information about the amount of memory used to generate the best results for all the defined datasets defined in Sect. 6. For each technique, the memory consumption of the internal representation is visualised in function of the depth. No graphs were added for techniques which do did deliver useful results (Fig. 5).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Steenwinckel, B., Vandewiele, G., Weyns, M. et al. INK: knowledge graph embeddings for node classification. Data Min Knowl Disc 36, 620–667 (2022). https://doi.org/10.1007/s10618-021-00806-z

Download citation

Received: 31 January 2021
Accepted: 09 October 2021
Published: 04 January 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10618-021-00806-z

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dynamic Knowledge Graph Embeddings via Local Embedding Reconstructions

Convolutional Complex Knowledge Graph Embeddings

The DLCC Node Classification Benchmark for Analyzing Knowledge Graph Embeddings

Notes

References

Acknowledgements