Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:1710.08381 (cs)

[Submitted on 23 Oct 2017]

Title:Near-Optimal Clustering in the $k$-machine model

Authors:Sayan Bandyapadhyay, Tanmay Inamdar, Shreyas Pai, Sriram V. Pemmaraju

View PDF

Abstract:The clustering problem, in its many variants, has numerous applications in operations research and computer science (e.g., in applications in bioinformatics, image processing, social network analysis, etc.). As sizes of data sets have grown rapidly, researchers have focused on designing algorithms for clustering problems in models of computation suited for large-scale computation such as MapReduce, Pregel, and streaming models. The $k$-machine model (Klauck et al., SODA 2015) is a simple, message-passing model for large-scale distributed graph processing. This paper considers three of the most prominent examples of clustering problems: the uncapacitated facility location problem, the $p$-median problem, and the $p$-center problem and presents $O(1)$-factor approximation algorithms for these problems running in $\tilde{O}(n/k)$ rounds in the $k$-machine model. These algorithms are optimal up to polylogarithmic factors because this paper also shows $\tilde{\Omega}(n/k)$ lower bounds for obtaining polynomial-factor approximation algorithms for these problems. These are the first results for clustering problems in the $k$-machine model.
We assume that the metric provided as input for these clustering problems in only implicitly provided, as an edge-weighted graph and in a nutshell, our main technical contribution is to show that constant-factor approximation algorithms for all three clustering problems can be obtained by learning only a small portion of the input metric.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
Cite as:	arXiv:1710.08381 [cs.DC]
	(or arXiv:1710.08381v1 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.1710.08381

Submission history

From: Shreyas Pai [view email]
[v1] Mon, 23 Oct 2017 16:57:50 UTC (30 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Near-Optimal Clustering in the $k$-machine model

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Near-Optimal Clustering in the $k$-machine model

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators