research-article

Neural PathSim for Inductive Similarity Search in Heterogeneous Information Networks

Authors:

Vincent W. Zheng,

Yangqiu SongAuthors Info & Claims

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

Pages 2201 - 2210

https://doi.org/10.1145/3459637.3482454

Published: 30 October 2021 Publication History

Abstract

PathSim is a widely used meta-path-based similarity in heterogeneous information networks. Numerous applications rely on the computation of PathSim, including similarity search and clustering. Computing PathSim scores on large graphs is computationally challenging due to its high time and storage complexity. In this paper, we propose to transform the problem of approximating the ground truth PathSim scores into a learning problem. We design an encoder-decoder based framework, NeuPath, where the algorithmic structure of PathSim is considered. Specifically, the encoder module identifies Top T optimized path instances, which can approximate the ground truth PathSim, and maps each path instance to an embedding vector. The decoder transforms each embedding vector into a scalar respectively, which identifies the similarity score. We perform extensive experiments on two real-world datasets in different domains, ACM and IMDB. Our results demonstrate that NeuPath performs better than state-of-the-art baselines in the PathSim approximation task and similarity search task.

References

[1]

Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018.

[2]

Phuc Do and Phu Pham. Dw-pathsim: a distributed computing model for topic-driven weighted meta-path-based similarity measure in a large-scale content-based heterogeneous information network. Journal of Information and Telecommunication, 3 (1): 19--38, 2019.

[3]

Yuxiao Dong, Nitesh V Chawla, and Ananthram Swami. metapath2vec: Scalable representation learning for heterogeneous networks. In SIGKDD, pages 135--144, 2017.

Digital Library

[4]

Changjun Fan, Li Zeng, Yuhui Ding, Muhao Chen, Yizhou Sun, and Zhong Liu. Learning to identify high betweenness centrality nodes from scratch: A novel graph neural network approach. In CIKM, pages 559--568, 2019.

Digital Library

[5]

Yuan Fang, Wenqing Lin, Vincent W Zheng, Min Wu, Kevin Chen-Chuan Chang, and Xiao-Li Li. Semantic proximity search on graphs with metagraph-based learning. In ICDE, pages 277--288, 2016.

[6]

Francois Fouss, Alain Pirotte, Jean-Michel Renders, and Marco Saerens. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Transactions on knowledge and data engineering, 19 (3): 355--369, 2007.

Digital Library

[7]

Justin Gilmer, Samuel S Schoenholz, Patrick F Riley, Oriol Vinyals, and George E Dahl. Neural message passing for quantum chemistry. ICML, 2017.

Digital Library

[8]

Aditya Grover and Jure Leskovec. node2vec: Scalable feature learning for networks. In SIGKDD, pages 855--864, 2016.

Digital Library

[9]

William L Hamilton, Rex Ying, and Jure Leskovec. Inductive representation learning on large graphs. In NeurIPS, pages 1025--1035, 2017.

Digital Library

[10]

Yu He, Yangqiu Song, Jianxin Li, Cheng Ji, Jian Peng, and Hao Peng. Hetespaceywalk: A heterogeneous spacey random walk for heterogeneous information network embedding. In CIKM, pages 639--648, 2019.

Digital Library

[11]

Anahita Hosseini, Ting Chen, Wenjun Wu, Yizhou Sun, and Majid Sarrafzadeh. Heteromed: Heterogeneous information network for medical diagnosis. In CIKM, pages 763--772, 2018.

Digital Library

[12]

Shifu Hou, Yanfang Ye, Yangqiu Song, and Melih Abdulhayoglu. Hindroid: An intelligent android malware detection system based on structured heterogeneous information network. In SIGKDD, pages 1507--1515, 2017.

Digital Library

[13]

Ziniu Hu, Yuxiao Dong, Kuansan Wang, and Yizhou Sun. Heterogeneous graph transformer. In WWW, pages 2704--2710, 2020.

Digital Library

[14]

elin and Kek"al"ainen(2002)]jarvelin2002cumulatedKalervo J"arvelin and Jaana Kek"al"ainen. Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS), 20 (4): 422--446, 2002.

Digital Library

[15]

Glen Jeh and Jennifer Widom. Simrank: a measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 538--543, 2002.

Digital Library

[16]

Glen Jeh and Jennifer Widom. Scaling personalized web search. In WWW, pages 271--279, 2003.

Digital Library

[17]

Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks. ICLR, 2017.

[18]

Yehuda Koren. Factorization meets the neighborhood: a multifaceted collaborative filtering model. In SIGKDD, pages 426--434, 2008.

Digital Library

[19]

]lao2010fastNi Lao and William W Cohen. Fast query execution for retrieval models based on path-constrained random walks. In SIGKDD, pages 881--888, 2010 a.

Digital Library

[20]

]lao2010relationalNi Lao and William W Cohen. Relational retrieval using a combination of path-constrained random walks. Machine learning, 81 (1): 53--67, 2010 b.

Digital Library

[21]

Pan Li, Yanbang Wang, Hongwei Wang, and Jure Leskovec. Distance encoding: Design provably more powerful neural networks for graph representation learning. In NeurIPS, 2020.

[22]

Xin Liu, Haojie Pan, Mutian He, Yangqiu Song, Xin Jiang, and Lifeng Shang. Neural subgraph isomorphism counting. In SIGKDD, pages 1959--1969, 2020.

Digital Library

[23]

Zhaoyu Lou, Jiaxuan You, Chengtao Wen, Arquimedes Canedo, Jure Leskovec, et al. Neural subgraph matching. arXiv preprint arXiv:2007.03092, 2020.

[24]

Changping Meng, Reynold Cheng, Silviu Maniu, Pierre Senellart, and Wangda Zhang. Discovering meta-paths in large heterogeneous information networks. In WWW, pages 754--764, 2015.

Digital Library

[25]

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. The pagerank citation ranking: Bringing order to the web. Technical report, Stanford InfoLab, 1999.

[26]

Adam Santoro, David Raposo, David G Barrett, Mateusz Malinowski, Razvan Pascanu, Peter Battaglia, and Timothy Lillicrap. A simple neural network module for relational reasoning. In NeurIPS, pages 4967--4976, 2017.

Digital Library

[27]

Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In ESWC, pages 593--607, 2018.

Digital Library

[28]

Chuan Shi, Xiangnan Kong, Yue Huang, S Yu Philip, and Bin Wu. Hetesim: A general framework for relevance measure in heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering, 26 (10): 2479--2492, 2014.

[29]

Chuan Shi, Yitong Li, Jiawei Zhang, Yizhou Sun, and S Yu Philip. A survey of heterogeneous information network analysis. IEEE Transactions on Knowledge and Data Engineering, 29 (1): 17--37, 2016.

Digital Library

[30]

Chuan Shi, Binbin Hu, Wayne Xin Zhao, and S Yu Philip. Heterogeneous information network embedding for recommendation. IEEE Transactions on Knowledge and Data Engineering (TKDE), 31 (2): 357--370, 2018.

Digital Library

[31]

Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, and Tianyi Wu. Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. VLDB, pages 992--1003, 2011.

Digital Library

[32]

Yizhou Sun, Brandon Norick, Jiawei Han, Xifeng Yan, Philip S. Yu, and Xiao Yu. Integrating meta-path selection with user-guided object clustering in heterogeneous information networks. In SIGKDD, pages 1348--1356, 2012.

Digital Library

[33]

Jian Tang, Meng Qu, and Qiaozhu Mei. Pte: Predictive text embedding through large-scale heterogeneous text networks. In SIGKDD, pages 1165--1174, 2015.

Digital Library

[34]

Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. Fast random walk with restart and its applications. In ICDM, pages 613--622. IEEE, 2006.

Digital Library

[35]

ović et al.(2018)Velivc ković, Cucurull, Casanova, Romero, Lio, and Bengio]velivckovic2017graphPetar Velivc ković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks. ICLR, 2018.

[36]

Petar Velikovi, Rex Ying, Matilde Padovano, Raia Hadsell, and Charles Blundell. Neural execution of graph algorithms. In ICLR, 2020.

[37]

Chenguang Wang, Yangqiu Song, Haoran Li, Ming Zhang, and Jiawei Han. Knowsim: A document similarity measure on structured heterogeneous information networks. In ICDM, pages 1015--1020, 2015.

Digital Library

[38]

Chenguang Wang, Yangqiu Song, Haoran Li, Ming Zhang, and Jiawei Han. Text classification with heterogeneous information network kernels. In AAAI, volume 30, 2016.

Digital Library

[39]

Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S. Yu. Heterogeneous graph attention network. In WWW, pages 2022--2032, 2019.

Digital Library

[40]

Keylu Xu, Jingling Li, Mozhi Zhang, Simon S Du, Ken-ichi Kawarabayashi, and Stefanie Jegelka. What can neural networks reason about? ICLR, 2020.

[41]

Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. How powerful are graph neural networks? In ICLR, 2019.

[42]

Xiaowei Xu, Nurcan Yuruk, Zhidan Feng, and Thomas AJ Schweiger. Scan: a structural clustering algorithm for networks. In SIGKDD, pages 824--833, 2007.

Digital Library

[43]

Xiao Yu, Xiang Ren, Yizhou Sun, Quanquan Gu, Bradley Sturt, Urvashi Khandelwal, Brandon Norick, and Jiawei Han. Personalized entity recommendation: A heterogeneous information network approach. In WSDM, pages 283--292, 2014.

Digital Library

[44]

Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim. Graph transformer networks. In NeurIPS, pages 11960--11970, 2019.

Digital Library

[45]

Huan Zhao, Quanming Yao, Jianda Li, Yangqiu Song, and Dik Lun Lee. Meta-graph based recommendation fusion over heterogeneous information networks. In SIGKDD, pages 635--644, 2017.

Digital Library

Cited By

Zhang JZan HWu SZhang KHuo J(2024)Adaptive Graph Neural Network with Incremental Learning Mechanism for Knowledge Graph ReasoningElectronics10.3390/electronics1314277813:14(2778)Online publication date: 15-Jul-2024
https://doi.org/10.3390/electronics13142778
Li TShou PWan XLi QWang RJia CXiao Y(2024)A fast malware detection model based on heterogeneous graph similarity searchComputer Networks10.1016/j.comnet.2024.110799254(110799)Online publication date: Dec-2024
https://doi.org/10.1016/j.comnet.2024.110799
Zhang YYu MZhang TYu G(2022)Semantic enhanced Top-k similarity search on weighted HINNeural Computing and Applications10.1007/s00521-022-07339-634:19(16911-16927)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1007/s00521-022-07339-6

Index Terms

Neural PathSim for Inductive Similarity Search in Heterogeneous Information Networks
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Similarity measures
      2. Top-k retrieval in databases
2. Mathematics of computing
  1. Discrete mathematics
    1. Graph theory
      1. Graph algorithms

Recommendations

PathSim: meta path-based top-K similarity search in heterogeneous information networks

Similarity search is a primitive operation in database and Web search engines. With the advent of large-scale heterogeneous information networks that consist of multi-typed, interconnected objects, such as the bibliographic networks and social media ...
MSLS: Meta-graph Search with Learnable Supernet for Heterogeneous Graph Neural Networks
SSDBM '23: Proceedings of the 35th International Conference on Scientific and Statistical Database Management

In recent years, heterogeneous graph neural networks (HGNNs) have achieved excellent performance. The efficient HGNNs consist of meta-graphs and aggregation operations. Since manually designing meta-graph is an expert-dependent and time-consuming ...
DiffMG: Differentiable Meta Graph Search for Heterogeneous Graph Neural Networks
KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining

In this paper, we propose a novel framework to automatically utilize task-dependent semantic information which is encoded in heterogeneous information networks (HINs). Specifically, we search for a meta graph, which can capture more complex semantic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

October 2021

4966 pages

ISBN:9781450384469

DOI:10.1145/3459637

General Chairs:
Gianluca Demartini
The University of Queensland, Australia
,
Guido Zuccon
The University of Queensland, Australia
,
Program Chairs:
J. Shane Culpepper
RMIT University, Australia
,
Zi Huang
The University of Queensland, Australia
,
Hanghang Tong
University of Illinois at Urbana-Champaign, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSFC
RIF
GRF
MHKJFS

Conference

CIKM '21

Sponsor:

CIKM '21: The 30th ACM International Conference on Information and Knowledge Management

November 1 - 5, 2021

Queensland, Virtual Event, Australia

Acceptance Rates

Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
171
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)4

Reflects downloads up to 23 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang JZan HWu SZhang KHuo J(2024)Adaptive Graph Neural Network with Incremental Learning Mechanism for Knowledge Graph ReasoningElectronics10.3390/electronics1314277813:14(2778)Online publication date: 15-Jul-2024
https://doi.org/10.3390/electronics13142778
Li TShou PWan XLi QWang RJia CXiao Y(2024)A fast malware detection model based on heterogeneous graph similarity searchComputer Networks10.1016/j.comnet.2024.110799254(110799)Online publication date: Dec-2024
https://doi.org/10.1016/j.comnet.2024.110799
Zhang YYu MZhang TYu G(2022)Semantic enhanced Top-k similarity search on weighted HINNeural Computing and Applications10.1007/s00521-022-07339-634:19(16911-16927)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1007/s00521-022-07339-6

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents