Computer Science > Databases

arXiv:2111.07761 (cs)

[Submitted on 15 Nov 2021 (v1), last revised 19 Jul 2022 (this version, v3)]

Title:EmbAssi: Embedding Assignment Costs for Similarity Search in Large Graph Databases

Authors:Franka Bause, Erich Schubert, Nils M. Kriege

View PDF

Abstract:The graph edit distance is an intuitive measure to quantify the dissimilarity of graphs, but its computation is NP-hard and challenging in practice. We introduce methods for answering nearest neighbor and range queries regarding this distance efficiently for large databases with up to millions of graphs. We build on the filter-verification paradigm, where lower and upper bounds are used to reduce the number of exact computations of the graph edit distance. Highly effective bounds for this involve solving a linear assignment problem for each graph in the database, which is prohibitive in massive datasets. Index-based approaches typically provide only weak bounds leading to high computational costs verification. In this work, we derive novel lower bounds for efficient filtering from restricted assignment problems, where the cost function is a tree metric. This special case allows embedding the costs of optimal assignments isometrically into $\ell_1$ space, rendering efficient indexing possible. We propose several lower bounds of the graph edit distance obtained from tree metrics reflecting the edit costs, which are combined for effective filtering. Our method termed EmbAssi can be integrated into existing filter-verification pipelines as a fast and effective pre-filtering step. Empirically we show that for many real-world graphs our lower bounds are already close to the exact graph edit distance, while our index construction and search scales to very large databases.

Comments:	Data Min Knowl Disc (2022)
Subjects:	Databases (cs.DB)
Cite as:	arXiv:2111.07761 [cs.DB]
	(or arXiv:2111.07761v3 [cs.DB] for this version)
	https://doi.org/10.48550/arXiv.2111.07761
Related DOI:	https://doi.org/10.1007/s10618-022-00850-3

Submission history

From: Franka Bause [view email]
[v1] Mon, 15 Nov 2021 14:03:15 UTC (427 KB)
[v2] Wed, 20 Apr 2022 07:35:56 UTC (286 KB)
[v3] Tue, 19 Jul 2022 08:34:20 UTC (286 KB)

Computer Science > Databases

Title:EmbAssi: Embedding Assignment Costs for Similarity Search in Large Graph Databases

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Databases

Title:EmbAssi: Embedding Assignment Costs for Similarity Search in Large Graph Databases

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators