Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2409.00657 (cs)

[Submitted on 1 Sep 2024 (v1), last revised 8 Sep 2024 (this version, v2)]

Title:HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration

Authors:Weijian Chen, Shuibing He, Haoyang Qu, Xuechen Zhang

Abstract:Distributed training of graph neural networks (GNNs) has become a crucial technique for processing large graphs. Prevalent GNN frameworks are model-centric, necessitating the transfer of massive graph vertex features to GNN models, which leads to a significant communication bottleneck. Recognizing that the model size is often significantly smaller than the feature size, we propose LeapGNN, a feature-centric framework that reverses this paradigm by bringing GNN models to vertex features. To make it truly effective, we first propose a micrograph-based training strategy that trains the model using a refined structure with superior locality to reduce remote feature retrieval. Then, we devise a feature pre-gathering approach that merges multiple fetch operations into a single one to eliminate redundant feature transmissions. Finally, we employ a micrograph-based merging method that adjusts the number of micrographs for each worker to minimize kernel switches and synchronization overhead. Our experimental results demonstrate that LeapGNN achieves a performance speedup of up to 4.2x compared to the state-of-the-art method, namely P3.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2409.00657 [cs.DC]
	(or arXiv:2409.00657v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2409.00657

Submission history

From: Weijian Chen [view email]
[v1] Sun, 1 Sep 2024 08:16:58 UTC (1,906 KB)
[v2] Sun, 8 Sep 2024 08:07:45 UTC (1,907 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:HopGNN: Boosting Distributed GNN Training Efficiency via Feature-Centric Model Migration

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators