An algorithm for online \(k\)-means clustering. (English) Zbl 1430.68455
Goodrich, Michael (ed.) et al., Proceedings of the 18th workshop on algorithm engineering and experiments, ALENEX ’16, Arlington, VA, USA, January 10, 2016. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). 81-89 (2016).
Summary: This paper shows that one can be competitive with the \(k\)-means objective while operating online. In this model, the algorithm receives vectors \(v_1, \dots, v_n\) one by one in an arbitrary order. For each vector \(v_t\) the algorithm outputs a cluster identifier before receiving \(v_{t+1}\). Our online algorithm generates \(O(k \log n \log \gamma n)\) clusters whose expected \(k\)-means cost is \(O(W^\ast \log n)\). Here, \(W^\ast\) is the optimal \(k\)-means cost using \(k\) clusters and \(\gamma\) is the aspect ratio of the data. The dependence on \(\gamma\) is shown to be unavoidable and tight. We also show that, experimentally, it is not much worse than \(k\)-means++ while operating in a strictly more constrained computational model.
For the entire collection see [Zbl 1331.68012].
For the entire collection see [Zbl 1331.68012].
MSC:
68W27 | Online algorithms; streaming algorithms |
62H30 | Classification and discrimination; cluster analysis (statistical aspects) |
68T05 | Learning and adaptive systems in artificial intelligence |
68W40 | Analysis of algorithms |