×

Streaming algorithms for \(k\)-center clustering with outliers and with anonymity. (English) Zbl 1159.68672

Goel, Ashish (ed.) et al., Approximation, randomization and combinatorial optimization. Algorithms and techniques. 11th international workshop, APPROX 2008, and 12th international workshop, RANDOM 2008, Boston, MA, USA, August 25–27, 2008. Proceedings. Berlin: Springer (ISBN 978-3-540-85362-6/pbk). Lecture Notes in Computer Science 5171, 165-178 (2008).
Summary: Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a clustering comparable in cost to the optimal offline solution, are especially useful. We develop the first streaming algorithms achieving a constant-factor approximation to the cluster radius for two variations of the \(k\)-center clustering problem. We give a streaming \((4 + \epsilon )\)-approximation algorithm using \(O(\epsilon ^{ - 1} kz)\) memory for the problem with outliers, in which the clustering is allowed to drop up to \(z\) of the input points; previous work used a random sampling approach which yields only a bicriteria approximation. We also give a streaming \((6 + \epsilon )\)-approximation algorithm using \(O(\epsilon ^{ - 1} \ln (\epsilon ^{ - 1}) k + k ^{2})\) memory for a variation motivated by anonymity considerations in which each cluster must contain at least a certain number of input points.
For the entire collection see [Zbl 1149.68010].

MSC:

68W25 Approximation algorithms
68W10 Parallel algorithms in computer science
Full Text: DOI