Document Zbl 1430.68455

Liberty, Edo; Sriharsha, Ram; Sviridenko, Maxim

An algorithm for online \(k\)-means clustering. (English) Zbl 1430.68455

Goodrich, Michael (ed.) et al., Proceedings of the 18th workshop on algorithm engineering and experiments, ALENEX ’16, Arlington, VA, USA, January 10, 2016. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). 81-89 (2016).

Summary: This paper shows that one can be competitive with the \(k\)-means objective while operating online. In this model, the algorithm receives vectors \(v_1, \dots, v_n\) one by one in an arbitrary order. For each vector \(v_t\) the algorithm outputs a cluster identifier before receiving \(v_{t+1}\). Our online algorithm generates \(O(k \log n \log \gamma n)\) clusters whose expected \(k\)-means cost is \(O(W^\ast \log n)\). Here, \(W^\ast\) is the optimal \(k\)-means cost using \(k\) clusters and \(\gamma\) is the aspect ratio of the data. The dependence on \(\gamma\) is shown to be unavoidable and tight. We also show that, experimentally, it is not much worse than \(k\)-means++ while operating in a strictly more constrained computational model.
For the entire collection see [Zbl 1331.68012].

Cited in 6 Documents

MSC:

68W27	Online algorithms; streaming algorithms
62H30	Classification and discrimination; cluster analysis (statistical aspects)
68T05	Learning and adaptive systems in artificial intelligence
68W40	Analysis of algorithms

Cite Review PDF

Full Text: DOI arXiv