×

An LP-based \(k\)-means algorithm for balancing weighted point sets. (English) Zbl 1381.62154

Summary: The classical \(k\)-means algorithm for partitioning \(n\) points in \(\mathbb{R}^d\) into \(k\) clusters is one of the most popular and widely spread clustering methods. The need to respect prescribed lower bounds on the cluster sizes has been observed in many scientific and business applications. In this paper, we present and analyze a generalization of \(k\)-means that is capable of handling weighted point sets and prescribed lower and upper bounds on the cluster sizes. We call it weight-balanced \(k\)-means. The key difference to existing models lies in the ability to handle the combination of weighted point sets with prescribed bounds on the cluster sizes. This imposes the need to perform partial membership clustering, and leads to significant differences. For example, while finite termination of all \(k\)-means variants for unweighted point sets is a simple consequence of the existence of only finitely many partitions of a given set of points, the situation is more involved for weighted point sets, as there are infinitely many partial membership clusterings. Using polyhedral theory, we show that the number of iterations of weight-balanced \(k\)-means is bounded above by \(n^O(dk)\), so in particular it is polynomial for fixed \(k\) and \(d\). This is similar to the known worst-case upper bound for classical \(k\)-means for unweighted point sets and unrestricted cluster sizes, despite the much more general framework. We conclude with the discussion of some additional favorable properties of our method.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
68Q32 Computational learning theory
90C05 Linear programming
90C90 Applications of mathematical programming
52B12 Special polytopes (linear programming, centrally symmetric, etc.)

References:

[1] Aloise, D.; Deshpande, A.; Hansen, P.; Popat, P., NP-hardness of Euclidean sum-of-squares clustering, Machine Learning, 75, 245-248 (2009) · Zbl 1378.68047
[2] Alon, N., Tools from higher algebra, (Graham, R.; Grötschel, M.; Lovász, L., Handbook of Combinatorics, vol. 2 (1995), MIT Press), 1749-1783 · Zbl 0848.05073
[3] Alpers, A.; Brieden, A.; Gritzmann, P.; Lyckegaard, A.; Poulsen, H., Generalized balanced power diagrams for 3D representations of polycrystals, Philosophical Magazine, 95, 1016-1028 (2015)
[4] Arthur, D.; Manthey, B.; Röglin, H., k-Means has polynomial smoothed complexity, Proceedings of the Fiftieth IEEE Symposium on Foundations of Computer Science, 405-414 (2009) · Zbl 1292.68187
[5] Arthur, D.; Manthey, B.; Röglin, H., Smoothed analysis of the k-means method, Journal of the Association for Computing Machinery, 58, 5, 19:1-19:31 (2010) · Zbl 1281.68224
[6] Aurenhammer, F., Power diagrams: Properties, algorithms and applications, SIAM Journal on Computing, 16, 1, 78-96 (1987) · Zbl 0616.52007
[7] Aurenhammer, F.; Hoffmann, F.; Aronov, B., Minkowski-type theorems and least-squares clustering, Algorithmica, 20, 61-76 (1998) · Zbl 0895.68135
[8] Aurenhammer, F.; Klein, R., Voronoi diagrams, (Sack, J.-R.; Urrutia, J., Handbook of Computational Geometry (1999), Elsevier Science), 201-290 · Zbl 0995.65024
[9] Awasthi, P.; Charikar, M.; Krishnaswamy, R.; Sinop, A. K., The hardness of approximation of Euclidean k-means, (Arge, L.; Pach, J., Proceedings of the Thirty-first International Symposium on Computational Geometry (2015)), 754-767 · Zbl 1378.68048
[10] Babaee, M.; Bahmanyar, R.; Rigoll, G.; Datcu, M., Interactive clustering for SAR image understanding, Proceedings of Tenth European Conference on Synthetic Aperture Radar (EUSAR), 1-4 (2014)
[11] Barnes, E. R.; Hoffman, A. J.; Rothblum, U. G., Optimal partitions having disjoint convex and conic hulls, Mathematical Programming, 54, 1, 69-86 (1992) · Zbl 0751.90068
[12] Basu, S.; Davidson, I.; Wagstaff, K. L., Constrained clustering: Advances in algorithms, theory, and applications (2009), Chapman & Hall/CRC · Zbl 1142.68005
[13] Borgwardt, S.; Brieden, A.; Gritzmann, P., Constrained minimum-\(k\)-star clustering and its application to the consolidation of farmland, Operational Research, 11, 1, 1-17 (2011) · Zbl 1365.90285
[14] Borgwardt, S.; Brieden, A.; Gritzmann, P., Geometric clustering for the consolidation of farmland and woodland, The Mathematical Intelligencer, 36, 2, 37-44 (2014) · Zbl 1401.52025
[15] Bradley, P. S.; Bennett, K. P.; Demiriz, A., Constrained k-means clustering (2000)
[16] Bredensteiner, E. J.; Bennett, K. P., Multicategory classification by support vector machines, Computational Optimizations and Applications, 12, 53-79 (1999) · Zbl 1040.90574
[17] Brieden, A.; Gritzmann, P., A quadratic optimization model for the consolidation of farmland by means of lend-lease agreements, (Ahr, D.; Fahrion, R.; Oswald, M.; Reinelt, G., Proceedings of the International Conference on Operations Research (2004)), 324-331 · Zbl 1059.90111
[18] Brieden, A.; Gritzmann, P., On clustering bodies: Geometry and polyhedral approximation, Discrete and Computational Geometry, 44, 3, 508-534 (2010) · Zbl 1211.52014
[19] Brieden, A.; Gritzmann, P., On optimal weighted balanced clusterings: Gravity bodies and power diagrams, SIAM Journal on Discrete Mathematics, 26, 415-434 (2012) · Zbl 1253.90189
[20] Crammer, K.; Singer, Y., On the algorithmic implementation of multiclass kernel-based vector machines, Journal of Machine Learning Research, 2, 265-292 (2001) · Zbl 1037.68110
[21] Demiriz, A.; Bennett, K. P.; Bradley, P. S., Using assignment constraints to avoid empty clusters in \(k\)-means clustering, (Basu, S.; Davidison, I.; Wagstaff, K., Constrained Clustering: Advances in Algorithms, Theory, and Applications. Constrained Clustering: Advances in Algorithms, Theory, and Applications, CRC data mining and knowledge discovery series (2008), Chapman & Hall)
[22] Hwang, F. K.; Onn, S.; Rothblum, U. G., A polynomial time algorithm for shaped partition problems, SIAM Journal on Optimization, 10, 1, 70-81 (1999) · Zbl 0955.90118
[23] Hwang, F. K.; Rothblum, U. G., Partitions - optimality and clustering: single-parameter, vol. I (2012), World Scientific · Zbl 1244.90003
[24] Hwang, F. K.; Rothblum, U. G.; Chen, H.-B., Partitions - optimality and clustering: multi-parameter, vol. II (2013), World Scientific · Zbl 1287.90055
[25] Inaba, M.; Katoh, N.; Imai, H., Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering, Proceedings of the Tenth Annual Symposium on Computational Geometry, 332-339 (1994)
[26] Lee, E.; Schmidt, M.; Wright, J., Improved and simplified inapproximability for \(k\)-means, Information Processing Letters, 120, 40-43 (2017) · Zbl 1400.68250
[27] Lloyd, S. P., Least squares quantization in PCM, IEEE Transactions on Information Theory, 28, 2, 129-137 (1982) · Zbl 0504.94015
[28] MacQueen, J. B., Some methods of classification and analysis of multivariate observations, Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 281-297 (1967) · Zbl 0214.46201
[29] Mahajan, M.; Nimbhorkar, P.; Varadarajan, K., The planar k-means problem is NP-hard, Theoretical Computer Science, 442, 13-21 (2012) · Zbl 1260.68158
[30] Malinen, M. I.; Fränti, P., Balanced k-means for clustering, Structural, syntactic, and statistical pattern recognition. Structural, syntactic, and statistical pattern recognition, Lecture Notes in Computer Science, vol. 8621, 32-41 (2014), Springer
[31] Onn, S.; Schulman, L. J., The vector partition problem for convex objective functions, Mathematics of Operations Research, 26, 3, 583-590 (2001) · Zbl 1073.90535
[32] Roughgarden, T.; Wang, J. R., The complexity of the \(k\)-means method, Proceedings of the Twenty-fourth Annual European Symposium on Algorithms. Proceedings of the Twenty-fourth Annual European Symposium on Algorithms, Leibniz International Proceedings in Informatics (LIPIcs), vol. 57, 78:1-78:14 (2016) · Zbl 1397.68108
[33] Schölkopf, B.; Smola, A., Learning with kernels (2002), MIT Press
[34] Vapnik, V., Statistical learning theory (1998), Wiley · Zbl 0935.62007
[35] Vattani, A., k-Means requires exponentially many iterations even in the plane, Discrete and Computational Geometry, 45, 596-616 (2011) · Zbl 1218.68088
[36] Warren, H. E., Lower bounds for approximation by nonlinear manifolds, Transactions of the American Mathematical Society, 133, 1, 167-178 (1968) · Zbl 0174.35403
[37] Weston, J.; Watkins, C., Multi-class support vector machines (1998)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.