×

A novel optimization approach towards improving separability of clusters. (English) Zbl 1542.62086

Summary: The objective functions in optimization models of the sum-of-squares clustering problem reflect intra-cluster similarity and inter-cluster dissimilarities and in general, optimal values of these functions can be considered as appropriate measures for compactness of clusters. However, the use of the objective function alone may not lead to the finding of separable clusters. To address this shortcoming in existing models for clustering, we develop a new optimization model where the objective function is represented as a sum of two terms reflecting the compactness and separability of clusters. Based on this model we develop a two-phase incremental clustering algorithm. In the first phase, the clustering function is minimized to find compact clusters and in the second phase, a new model is applied to improve the separability of clusters. The Davies-Bouldin cluster validity index is applied as an additional measure to compare the compactness of clusters and silhouette coefficients are used to estimate the separability of clusters. The performance of the proposed algorithm is demonstrated and compared with that of four other algorithms using synthetic and real-world data sets. Numerical results clearly show that in comparison with other algorithms the new algorithm is able to find clusters with better separability and similar compactness.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
90C26 Nonconvex programming, global optimization
Full Text: DOI

References:

[1] Al-Sultan, K. S., A tabu search approach to the clustering problem, Pattern Recognit., 28, 9, 1443-1451 (1995)
[2] An, L. T.H.; Belghiti, M. T.; Tao, P. D., A new efficient algorithm based on DC programming and DCA for clustering, J. Global Optim., 37, 4, 593-608 (2007) · Zbl 1198.90327
[3] An, L.; Minh, L.; Tao, P., New and efficient DCA based algorithms for minimum sum-of-squares clustering, Pattern Recognit., 47, 1, 388-401 (2014) · Zbl 1326.68225
[4] Apcluster, L. (2022), https://cran.r-project.org/web/packages/apcluster/apcluster.pdf, (accessed in May 2022)
[5] Arbelaitz, O.; Gurrutxaga, I.; Mugueiza, J.; Perez, J.; Perona, I., An extensive comparative study of cluster validity indices, Pattern Recognit., 46, 1, 243-256 (2013)
[6] Bagirov, A., Modified global \(k\)-means algorithm for sum-of-squares clustering problems, Pattern Recognit., 41, 10, 3192-3199 (2008) · Zbl 1147.68669
[7] Bagirov, A. M.; Karasözen, B.; Sezer, M., Discrete gradient method: derivative-free method for nonsmooth optimization, J. Optim. Theory Appl., 137, 2, 317-334 (2008) · Zbl 1165.90021
[8] Bagirov, A. M.; Karmitsa, N.; Taheri, S., Partitional Clustering Via Nonsmooth Optimization (2020), Springer, Cham · Zbl 1448.68001
[9] Bagirov, A.; Mohebi, E., Nonsmooth optimization based algorithms in cluster analysis, (Celebi, M., Partitional Clustering Algorithms (2015), Springer, Cham), 99-146
[10] Bagirov, A.; Ordin, B.; Ozturk, G.; Xavier, A., An incremental clustering algorithm based on hyperbolic smoothing, Comput. Optim. Appl., 61, 1, 219-241 (2015) · Zbl 1309.90098
[11] Bagirov, A.; Taheri, S.; Ugon, J., Nonsmooth DC programming approach to the minimum sum-of-squares clustering problems, Pattern Recognit., 53, 12-24 (2016) · Zbl 1412.68200
[12] Bagirov, A.; Ugon, J., An algorithm for minimizing clustering functions, Optimization, 54, 4-5, 351-368 (2005) · Zbl 1122.90059
[13] Bagirov, A.; Ugon, J.; Webb, D., Fast modified global \(k\)-means algorithm for incremental cluster construction, Pattern Recognit., 44, 4, 866-876 (2011) · Zbl 1213.68514
[14] Bagirov, A.; Yearwood, J., A new nonsmooth optimization algorithm for minimum sum-of-squares clustering problems, European J. Oper. Res., 170, 2, 578-596 (2006) · Zbl 1085.90045
[15] Batool, F.; Hennig, C., Clustering with the average silhouette width, Comput. Statist. Data Anal., 158, Article 107190 pp. (2021) · Zbl 1510.62273
[16] Bock, H., Clustering and neural networks, (Rizzi, A.; Vichi, M.; Bock, H., Advances in Data Science and Classification (1998), Springer, Berlin), 265-277 · Zbl 1051.91523
[17] Cluster, H. (2022), https://cran.r-project.org/web/packages/ClusterR/vignettes/the_clusterR_package.html
[18] Davies, D.; Bouldin, D., A cluster separation measure, IEEE Trans. Pattern Anal. Mach. Intell., 2, 224-227 (1979)
[19] Dua, D.; Graff, C., UCI Machine Learning Repository (2019), University of California, Irvine, School of Information and Computer Sciences, URL http://archive.ics.uci.edu/ml
[20] Dunn, J., Well-separated clusters and optimal fuzzy partitions, J. Cybern., 4, 1, 95-104 (1974) · Zbl 0304.68093
[21] Frey, B.; Dueck, D., Clustering by passing messages between data points, Science, 315, 972-976 (2007) · Zbl 1226.94027
[22] Gagolewski, M.; Bartoszuk, M.; Cena, A., Are cluster validity measures (in)valid?, Inform. Sci., 581, 620-636 (2021)
[23] Hennig, C., What are the true clusters?, Pattern Recognit. Lett., 64, 53-62 (2015)
[24] Hubert, L.; Arabie, P., Comparing partitions, J. Classification, 2, 1, 193-218 (1985)
[25] Jain, A., Data clustering: 50 years beyond \(k\)-means, Pattern Recognit. Lett., 31, 8, 651-666 (2010)
[26] Jain, A.; Dubes, R., Algorithms for Clustering Data (1988), Prentice Hall: Prentice Hall Upper Saddle River, NJ · Zbl 0665.62061
[27] Karmitsa, N.; Bagirov, A.; Taheri, S., New diagonal bundle method for clustering problems in large data sets, European J. Oper. Res., 263, 2, 367-379 (2017) · Zbl 1380.90226
[28] Karmitsa, N.; Bagirov, A.; Taheri, S., Clustering in large data sets with the limited memory bundle method, Pattern Recognit., 83, 245-259 (2018)
[29] Kolesnikov, A.; Trichina, E.; Kauranne, T., Estimating the number of clusters in a numerical data set via quantization error modeling, Pattern Recognit., 48, 941-952 (2015)
[30] Krislock, N.; Malick, J.; Roupin, F., Computational results of a semidefinite branch-and-bound algorithm for \(k\)-cluster, Comput. Oper. Res., 66, 153-159 (2016) · Zbl 1349.90715
[31] Licors, N. (2015), https://cran.r-project.org/web/packages/LICORS/LICORS.pdf, (accessed in August 2021)
[32] Likas, A.; Vlassis, N.; Verbeek, J., The global \(k\)-means clustering algorithm, Pattern Recognit., 36, 2, 451-461 (2003)
[33] Milligan, G.; Cooper, M., An examination of procedures for determining the number of clusters in a data set, Psychometrika, 50, 2, 159-179 (1985)
[34] Murtagh, F.; Legendre, P., Ward’s hierarchical agglomerative clustering method: Which algorithms implement ward’s criterion?, J. Classification, 31, 274-295 (2014) · Zbl 1360.62344
[35] Ordin, B.; Bagirov, A. M., A heuristic algorithm for solving the minimum sum-of-squares clustering problems, J. Global Optim., 61, 2, 341-361 (2015) · Zbl 1311.90111
[36] Pacheco, J., A scatter search approach for the minimum sum-of-squares clustering problem, Comput. Oper. Res., 32, 1325-1335 (2005) · Zbl 1075.90037
[37] Rahman, M. A.; Islam, M. Z., A hybrid clustering technique combining a novel genetic algorithm with \(k\)-means, Knowl.-Based Syst., 71, 345-365 (2014)
[38] Reinelt, G., TSP-LIB-A traveling salesman problem library, ORSA J. Comput., 3, 319-350 (1991)
[39] Rousseeuw, P. J., Silhouettes: A graphical aid to the interpretation and validation of cluster analysis, J. Comput. Appl. Math., 20, 53-65 (1987) · Zbl 0636.62059
[40] Saha, J.; Mukherjee, J., CNAK: Cluster number assisted \(k\)-means, Pattern Recognit., 110, Article 107625 pp. (2021)
[41] Taheri, S., COMSEP-clust (2021), https://github.com/SnTa2019/Clustering-via-Nonsmooth-Optimization
[42] Vassilvitskii, S., Arthur, D., 2007. k-means++: The advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’07). pp. 1027-1035. · Zbl 1302.68273
[43] Xavier, A., The hyperbolic smoothing clustering method, Pattern Recognit., 43, 3, 731-737 (2010) · Zbl 1187.68514
[44] Zhang, Y.; Mańdziuk, J.; Quek, C.; Goh, B., Curvature-based method for determining the number of clusters, Inform. Sci., 415-416, 414-428 (2017)
[45] Zhao, Q., Xu, M., Fränti, P., 2009. Sum-of-squares based cluster validity index and significance analysis. In: International Conference on Adaptive and Natural Computing Algorithms. pp. 313-322.
[46] Zhou, S.; Xu, Z., A novel internal validity index based on the cluster centre and the nearest neighbour cluster, Appl. Soft Comput., 71, 78-88 (2018)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.