×

An algorithm for online \(k\)-means clustering. (English) Zbl 1430.68455

Goodrich, Michael (ed.) et al., Proceedings of the 18th workshop on algorithm engineering and experiments, ALENEX ’16, Arlington, VA, USA, January 10, 2016. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). 81-89 (2016).
Summary: This paper shows that one can be competitive with the \(k\)-means objective while operating online. In this model, the algorithm receives vectors \(v_1, \dots, v_n\) one by one in an arbitrary order. For each vector \(v_t\) the algorithm outputs a cluster identifier before receiving \(v_{t+1}\). Our online algorithm generates \(O(k \log n \log \gamma n)\) clusters whose expected \(k\)-means cost is \(O(W^\ast \log n)\). Here, \(W^\ast\) is the optimal \(k\)-means cost using \(k\) clusters and \(\gamma\) is the aspect ratio of the data. The dependence on \(\gamma\) is shown to be unavoidable and tight. We also show that, experimentally, it is not much worse than \(k\)-means++ while operating in a strictly more constrained computational model.
For the entire collection see [Zbl 1331.68012].

MSC:

68W27 Online algorithms; streaming algorithms
62H30 Classification and discrimination; cluster analysis (statistical aspects)
68T05 Learning and adaptive systems in artificial intelligence
68W40 Analysis of algorithms