×

A review on declarative approaches for constrained clustering. (English) Zbl 07885918

Summary: Clustering is an important Machine Learning task, which aims at discovering the implicit structure of data. Applying a clustering algorithm is easy but since clustering is an unsupervised task, tuning it so that the results is appropriate to the expert expectations is much less obvious. To overcome this, expert knowledge can be integrated into a clustering process; this is generally formalized as constraints on the desired output, thus leading to constrained clustering. There are two lines of research for clustering: distance based clustering, where data are grouped into clusters according to their dissimilarity and conceptual clustering, where a cluster must be a concept that is a set of objects and a set of properties that describe them. This second approach relies on Formal Concept Analysis and benefits from advances in Pattern Mining. [66] has shown the interest of declarative approaches for pattern mining and has led to a new research direction for clustering that is interested in the use of declarative frameworks, such as Integer Linear Programming, Constraint Programming or SAT. This has several advantages: finding a global optimum, integrating different kinds of constraints, even complex ones in a clustering process and even combining conceptual and distance-based clustering. In this paper we present an inventory of constraints and a survey of declarative methods for constrained clustering.

MSC:

68T37 Reasoning under uncertainty in the context of artificial intelligence

Software:

LCM; ECCLAT; FaceNet
Full Text: DOI

References:

[1] Aloise, Daniel; Deshpande, Amit; Hansen, Pierre; Popat, Preyas, Np-hardness of euclidean sum-of-squares clustering, Mach. Learn., 75, 2, 245-248, May 2009 · Zbl 1378.68047
[2] Daniel, Aloise; Hansen, Pierre; Liberti, Leo, An improved column generation algorithm for minimum sum-of-squares clustering, Math. Program., 131, 1-2, 195-220, 2012 · Zbl 1236.90095
[3] Agrawal, Rakesh; Imielinski, Tomasz; Swami, Arun N., Mining association rules between sets of items in large databases, (Buneman, Peter; Jajodia, Sushil, Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, Washington, DC, USA, May 26-28, 1993, 1993, ACM Press), 207-216
[4] Basu, Sugato; Bilenko, Mikhail; Banerjee, Arindam; Mooney, Raymond J., Probabilistic semi-supervised clustering with constraints, (Chapelle, Olivier; Schölkopf, Bernhard; Zien, Alexander, Semi-Supervised Learning, 2006, The MIT Press), 73-102
[5] Bradley, P.; Bennett, K.; Demiriz, A., Constrained k-means clustering, 2000, Microsoft Research, Technical Report MSR-TR-2000-65
[6] Basu, S.; Banerjee, A.; Mooney, R., Active semi-supervision for pairwise constrained clustering, (Proceedings of the SIAM International Conference on Data Mining, 2004), 333-344
[7] Basu, S.; Bilenko, M.; Mooney, R., A probabilistic framework for semi-supervised clustering, (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2004), 59-68
[8] Bilenko, M.; Basu, S.; Mooney, R., Integrating constraints and metric learning in semi-supervised clustering, (Proceedings of the International Conference on Machine Learning, 2004), 11-18
[9] Babaki, Behrouz; Guns, Tias; Nijssen, Siegfried, Constrained Clustering Using Column Generation, (Simonis, Helmut, Integration of AI and OR Techniques in Constraint Programming - 11th International Conference, CPAIOR 2014, Cork, Ireland, May 19-23, 2014. Proceedings. Integration of AI and OR Techniques in Constraint Programming - 11th International Conference, CPAIOR 2014, Cork, Ireland, May 19-23, 2014. Proceedings, Lecture Notes in Computer Science, vol. 8451, 2014, Springer), 438-454 · Zbl 1407.68387
[10] (Biere, Armin; Heule, Marijn; van Maaren, Hans; Walsh, Toby, Handbook of Satisfiability. Handbook of Satisfiability, Frontiers in Artificial Intelligence and Applications, vol. 336, 2021, IOS Press) · Zbl 1456.68001
[11] Brusco, Michael J., A repetitive branch-and-bound procedure for minimum within-cluster sums of squares partitioning, Psychometrika, 71, 2, 347-363, 2006 · Zbl 1306.62387
[12] Caron, Mathilde; Bojanowski, Piotr; Joulin, Armand; Douze, Matthijs, Deep clustering for unsupervised learning of visual features, (Ferrari, Vittorio; Hebert, Martial; Sminchisescu, Cristian; Weiss, Yair, Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XIV. Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part XIV, Lecture Notes in Computer Science, vol. 11218, 2018, Springer), 139-156
[13] Chopra, Sumit; Hadsell, Raia; LeCun, Yann, Learning a similarity metric discriminatively, with application to face verification, (2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), 20-26 June 2005, San Diego, CA, USA, 2005, IEEE Computer Society), 539-546
[14] Chierichetti, Flavio; Kumar, Ravi; Lattanzi, Silvio; Vassilvitskii, Sergei, Fair clustering through fairlets, (Advances in Neural Information Processing Systems, 2017), 5029-5037
[15] Cook, Stephen A., The complexity of theorem-proving procedures, (Harrison, Michael A.; Banerji, Ranan B.; Ullman, Jeffrey D., Proceedings of the 3rd Annual ACM Symposium on Theory of Computing, May 3-5, 1971, Shaker Heights, Ohio, USA, 1971, ACM), 151-158 · Zbl 0253.68020
[16] Chabert, Maxime; Solnon, Christine, Constraint programming for multi-criteria conceptual clustering, (Principles and Practice of Constraint Programming - 23rd International Conference, CP 2017, Proceedings, 2017), 460-476
[17] Darwiche, Adnan, Sdd: a new canonical representation of propositional knowledge bases, (IJCAI, 2011)
[18] Davidson, Ian, Clustering with constraints, (Liu, Ling; Tamer Özsu, M., Encyclopedia of Database Systems, 2018, Springer)
[19] Demiriz, A.; Bennett, K.; Bradley, P., Using assignment constraints to avoid empty clusters in k-means clustering, (Basu, S.; Davidson, I.; Wagstaff, K., Constrained Clustering: Advances in Algorithms, Theory, and Applications, 2008, Chapman & Hall/CRC), 201-220, chapter 9 · Zbl 1161.68762
[20] Demiriz, Ayhan; Bennett, Kristin P.; Bradley, Paul S., Using assignment constraints to avoid empty clusters in k-means clustering, (Constrained Clustering: Advances in Algorithms, Theory, and Applications, 2008)
[21] Durand, Nicolas; Crémilleux, Bruno, Ecclat: a new approach of clusters discovery in categorical data, (Twenty-Second Annual International Conference Knowledge Based Systems and Applied Artificial Intelligence (ES 2002), 2003, Springer), 177-190 · Zbl 1095.68634
[22] Dao, Thi-Bich-Hanh; Duong, Khanh-Chuong; Vrain, Christel, A declarative framework for constrained clustering, (Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2013), 419-434
[23] Dao, Thi-Bich-Hanh; Duong, Khanh-Chuong; Vrain, Christel, A filtering algorithm for constrained clustering with within-cluster sum of dissimilarities criterion, (Proceedings of the 25th International Conference on Tools with Artificial Intelligence, 2013), 1060-1067
[24] Dao, Thi-Bich-Hanh; Duong, Khanh-Chuong; Vrain, Christel, Constrained minimum sum of squares clustering by constraint programming, (Principles and Practice of Constraint Programming - 21st International Conference, CP 2015, Cork, Ireland, August 31 - September 4, 2015, Proceedings, 2015), 557-573 · Zbl 1318.68035
[25] Dao, Thi-Bich-Hanh; Duong, Khanh-Chuong; Vrain, Christel, Constrained clustering by constraint programming, Artif. Intell., 244, 70-94, 2017 · Zbl 1404.68141
[26] Defays, D., An efficient algorithm for a complete link method, Comput. J., 20, 4, 364-366, 1977 · Zbl 0364.68038
[27] Delattre, Michel; Hansen, Pierre, Bicriterion cluster analysis, IEEE Trans. Pattern Anal. Mach. Intell., Pami-2, 4, 277-291, 1980 · Zbl 0458.62049
[28] Dwork, Cynthia; Hardt, Moritz; Pitassi, Toniann; Reingold, Omer; Zemel, Richard S., Fairness through awareness, (Innovations in Theoretical Computer Science 2012, 2012), 214-226 · Zbl 1348.91230
[29] Dao, Thi-Bich-Hanh; Kuo, Chia-Tung; Ravi, S. S.; Vrain, Christel; Davidson, Ian, Descriptive clustering: ILP and CP formulations with applications, (Lang, Jérôme, Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden, 2018, ijcai.org), 1263-1269
[30] Davis, Martin; Logemann, George; Loveland, Donald W., A machine program for theorem-proving, Commun. ACM, 5, 7, 394-397, 1962 · Zbl 0217.54002
[31] du Merle, O.; Hansen, P.; Jaumard, B.; Mladenovic, N., An interior point algorithm for minimum sum-of-squares clustering, SIAM J. Sci. Comput., 21, 4, 1485-1505, 1999 · Zbl 1049.90129
[32] Davidson, Ian; Ravi, S. S., Clustering with constraints: feasibility issues and the k-means algorithm, (Kargupta, Hillol; Srivastava, Jaideep; Kamath, Chandrika; Goodman, Arnold, Proceedings of the 2005 SIAM International Conference on Data Mining, SDM 2005, Newport Beach, CA, USA, April 21-23, 2005, 2005, SIAM), 138-149
[33] Davidson, Ian; Ravi, S. S., The complexity of non-hierarchical clustering with instance and cluster level constraints, Data Min. Knowl. Discov., 14, 1, 25-61, 2007
[34] Davidson, I.; Ravi, S.; Shamis, L., A SAT-based framework for efficient constrained clustering, (Proceedings of the SIAM International Conference on Data Mining, 2010), 94-105
[35] Dao, Thi-Bich-Hanh; Vrain, Christel; Duong, Khanh-Chuong; Davidson, Ian, A framework for actionable clustering using constraint programming, (Kaminka, Gal A.; Fox, Maria; Bouquet, Paolo; Hüllermeier, Eyke; Dignum, Virginia; Dignum, Frank; van Harmelen, Frank, ECAI 2016 - 22nd European Conference on Artificial Intelligence. ECAI 2016 - 22nd European Conference on Artificial Intelligence, Frontiers in Artificial Intelligence and Applications, vol. 285, 2016, IOS Press), 453-461
[36] Ester, Martin; Kriegel, Hans-Peter; Sander, Jörg; Xu, Xiaowei, A density-based algorithm for discovering clusters in large spatial databases with noise, (Simoudis, Evangelos; Han, Jiawei; Fayyad, Usama M., Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), Portland, Oregon, USA, 1996, AAAI Press), 226-231
[37] Fisher, Douglas H., Knowledge acquisition via incremental conceptual clustering, Mach. Learn., 2, 2, 139-172, 1987
[38] Focacci, F.; Lodi, A.; Milano, M., Cost-based domain filtering, (Proceedings of the 5th International Conference on Principles and Practice of Constraint Programming, 1999), 189-203
[39] González-Almagro, Germán; Peralta, Daniel; De Poorter, Eli; Cano, José-Ramón; García, Salvador, Semi-Supervised Constrained Clustering: An in-Depth Overview, Ranked Taxonomy and Future Research Directions, 2023
[40] Gluck, M. A.; Corter, J. E., Information uncertainty and the utility of categories, (Proceedings of the Seventh Annual Conference of Cognitive Science Society, 1985), 283-287
[41] Guns, Tias; Dao, Thi-Bich-Hanh; Vrain, Christel; Duong, Khanh-Chuong, Repetitive branch-and-bound using constraint programming for constrained minimum sum-of-squares clustering, (ECAI 2016 - 22nd European Conference on Artificial Intelligence. ECAI 2016 - 22nd European Conference on Artificial Intelligence, Frontiers in Artificial Intelligence and Applications, vol. 285, 2016, IOS Press), 462-470
[42] Guo, Xifeng; Gao, Long; Liu, Xinwang; Yin, Jianping, Improved deep embedded clustering with local structure preservation, (IJCAI 2017, 2017), 1753-1759
[43] Guns, Tias; Nijssen, Siegfried; De Raedt, Luc, k-pattern set mining under constraints, IEEE Trans. Knowl. Data Eng., 25, 2, 402-418, 2013
[44] Gomes, Carla P.; Selman, Bart; Kautz, Henry A., Boosting combinatorial search through randomization, (Mostow, Jack; Rich, Chuck, Proceedings of the Fifteenth National Conference on Artificial Intelligence and Tenth Innovative Applications of Artificial Intelligence Conference, AAAI 98, IAAI 98, July 26-30, 1998, Madison, Wisconsin, USA, 1998, AAAI Press / The MIT Press), 431-437
[45] Guilbert, Mathieu; Vrain, Christel; Dao, Thi-Bich-Hanh; de Souto, Marcilio C. P., Anchored constrained clustering ensemble, (International Joint Conference on Neural Networks, IJCNN, 2022, Padua, Italy, July 18-23, 2022, 2022, IEEE), 1-8
[46] Hansen, Pierre; Delattre, Michel, Complete-link cluster analysis by graph coloring, J. Am. Stat. Assoc., 73, 362, 397-403, 1978 · Zbl 0432.05004
[47] Hansen, Pierre; Jaumard, Brigitte, Cluster analysis and mathematical programming, Math. Program., 79, 1-3, 191-215, 1997 · Zbl 0887.90182
[48] Ienco, Dino; Pensa, Ruggero G., Deep triplet-driven semi-supervised embedding clustering, (DS, 2019, Springer), 220-234
[49] Bayardo, Roberto J.; Schrag, Robert, Using CSP look-back techniques to solve real-world SAT instances, (Kuipers, Benjamin; Webber, Bonnie L., Proceedings of the Fourteenth National Conference on Artificial Intelligence and Ninth Innovative Applications of Artificial Intelligence Conference, AAAI 97, IAAI 97, July 27-31, 1997, Providence, Rhode Island, USA, 1997, AAAI Press / The MIT Press), 203-208
[50] Khiari, Mehdi; Boizumault, Patrice; Crémilleux, Bruno, Constraint programming for mining n-ary patterns, (Cohen, David, Principles and Practice of Constraint Programming - CP 2010 - 16th International Conference, CP 2010, St. Andrews, Scotland, UK, September 6-10, 2010. Proceedings. Principles and Practice of Constraint Programming - CP 2010 - 16th International Conference, CP 2010, St. Andrews, Scotland, UK, September 6-10, 2010. Proceedings, Lecture Notes in Computer Science, vol. 6308, 2010, Springer), 552-567
[51] Kumar, Nimit; Kummamuru, Krishna, Semisupervised clustering with metric learning using relative comparisons, IEEE Trans. Knowl. Data Eng., 20, 4, 496-503, 2008
[52] Kuo, Chia-Tung; Ravi, S. S.; Dao, Thi-Bich-Hanh; Vrain, Christel; Davidson, Ian, A framework for minimal clustering modification via constraint programming, (Singh, Satinder; Markovitch, Shaul, Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, February 4-9, 2017, San Francisco, California, USA, 2017, AAAI Press), 1389-1395
[53] Lampert, Thomas Andrew; Dao, Thi-Bich-Hanh; Lafabregue, Baptiste; Serrette, Nicolas; Forestier, Germain; Crémilleux, Bruno; Vrain, Christel; Gançarski, Pierre, Constrained distance based clustering for time-series: a comparative and experimental study, Data Min. Knowl. Discov., 32, 6, 1663-1707, 2018
[54] Lloyd, Stuart P., Least squares quantization in PCM, IEEE Trans. Inf. Theory, 28, 2, 129-136, 1982 · Zbl 0504.94015
[55] MacQueen, J. B., Some methods for classification and analysis of multivariate observations, (Le Cam, L. M.; Neyman, J., Proc. of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1967, University of California Press), 281-297 · Zbl 0214.46201
[56] Métivier, Jean-Philippe; Boizumault, Patrice; Crémilleux, Bruno; Khiari, Mehdi; Loudni, Samir, Constrained clustering using SAT, (Hollmén, Jaakko; Klawonn, Frank; Tucker, Allan, Advances in Intelligent Data Analysis XI - 11th International Symposium, IDA 2012, Helsinki, Finland, October 25-27, 2012. Proceedings, 2012), 207-218
[57] Michalski, Ryszard S., A theory and methodology of inductive learning, Artif. Intell., 20, 2, 111-161, 1983
[58] Mueller, Marianne; Kramer, Stefan, Integer linear programming models for constrained clustering, (Pfahringer, Bernhard; Holmes, Geoffrey; Hoffmann, Achim G., Discovery Science - 13th International Conference, DS 2010, Canberra, Australia, October 6-8, 2010. Proceedings. Discovery Science - 13th International Conference, DS 2010, Canberra, Australia, October 6-8, 2010. Proceedings, Lecture Notes in Computer Science, vol. 6332, 2010, Springer), 159-173
[59] Michalski, Ryszard S.; Stepp, Robert E., Automated construction of classifications: conceptual clustering versus numerical taxonomy, IEEE Trans. Pattern Anal. Mach. Intell., 5, 4, 396-410, 1983
[60] Ng, M., A note on constrained k-means algorithms, Pattern Recognit., 33, 3, 515-519, 2000
[61] Nghiem, N.-V.-D.; Vrain, C.; Dao, T.-B.-H., Knowledge integration in deep clustering, (ECMLPKDD 2022, 2022)
[62] Nghiem, Nguyen-Viet-Dung; Vrain, Christel; Dao, Thi-Bich-Hanh; Davidson, Ian, Constrained clustering via post-processing, (Appice, Annalisa; Tsoumakas, Grigorios; Manolopoulos, Yannis; Matwin, Stan, Discovery Science - 23rd International Conference, DS 2020, Thessaloniki, Greece, October 19-21, 2020, Proceedings. Discovery Science - 23rd International Conference, DS 2020, Thessaloniki, Greece, October 19-21, 2020, Proceedings, Lecture Notes in Computer Science, vol. 12323, 2020, Springer), 53-67
[63] Ouali, A.; Loudni, S.; Lebbah, Y.; Boizumault, P.; Zimmermann, A.; Loukil, L., Efficiently finding conceptual clustering models with integer linear programming, (Proceedings of the International Joint Conference on Artificial Intelligence, 2016), 647-654
[64] Papadimitriou, Christos H.; Steiglitz, Kenneth, Combinatorial Optimization: Algorithms and Complexity, 1982, Prentice-Hall · Zbl 0503.90060
[65] Régin, Jean-Charles, Arc consistency for global cardinality constraints with costs, (Proceedings of the 5th International Conference on Principles and Practice of Constraint Programming, 1999), 390-404
[66] De Raedt, Luc; Guns, Tias; Nijssen, Siegfried, Constraint programming for itemset mining, (Li, Ying; Liu, Bing; Sarawagi, Sunita, Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, USA, August 24-27, 2008, 2008, ACM), 204-212
[67] Ren, Yazhou; Hu, Kangrong; Dai, Xinyi; Pan, Lili; Hoi, Steven C. H.; Xu, Zenglin, Semi-supervised deep embedded clustering, Neurocomputing, 325, 121-130, 2019
[68] (Rossi, Francesca; van Beek, Peter; Walsh, Toby, Handbook of Constraint Programming. Foundations of Artificial Intelligence, August 2006, Elsevier B.V.: Elsevier B.V. Amsterdam, Netherlands) · Zbl 1175.90011
[69] Sang, Tian; Beame, Paul; Kautz, Henry A., Performing bayesian inference by weighted model counting, (AAAI, vol. 5, 2005), 475-481
[70] Sibson, Robin, SLINK: an optimally efficient algorithm for the single-link cluster method, Comput. J., 16, 1, 30-34, 1973
[71] Schroff, Florian; Kalenichenko, Dmitry; Philbin Facenet, James, A unified embedding for face recognition and clustering, 2015, CoRR
[72] Marques Silva, João P.; Sakallah, Karem A., GRASP: a search algorithm for propositional satisfiability, IEEE Trans. Comput., 48, 5, 506-521, 1999 · Zbl 1392.68388
[73] Tseitin, G. S., On the Complexity of Derivation in Propositional Calculus, 466-483, 1983, Springer Berlin Heidelberg: Springer Berlin Heidelberg Berlin, Heidelberg
[74] Uno, Takeaki; Asai, Tatsuya; Uchida, Yuzo; Arimura, Hiroki, LCM: an efficient algorithm for enumerating frequent closed item sets, (Goethals, Bart; Zaki, Mohammed Javeed, FIMI ’03, Frequent Itemset Mining Implementations, Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations, 19 December 2003, Melbourne, Florida, USA. FIMI ’03, Frequent Itemset Mining Implementations, Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations, 19 December 2003, Melbourne, Florida, USA, CEUR Workshop Proceedings, vol. 90, 2003, CEUR-WS.org)
[75] Van Gansbeke, Wouter; Vandenhende, Simon; Georgoulis, Stamatios; Proesmans, Marc; Van, Luc; Scan, Gool, Learning to classify images without labels, (ECCV, 2020), 268-285
[76] Wagstaff, K.; Cardie, C., Clustering with instance-level constraints, (Proceedings of the 17th International Conference on Machine Learning, 2000), 1103-1110
[77] Wagstaff, K.; Cardie, C.; Rogers, S.; Schroedl, S., Constrained k-means clustering with background knowledge, (Proceedings of the International Conference on Machine Learning, 2001), 577-584
[78] Xie, Junyuan; Girshick, Ross B.; Farhadi, Ali, Unsupervised deep embedding for clustering analysis, (Balcan, Maria-Florina; Weinberger, Kilian Q., Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016. Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016, JMLR Workshop and Conference Proceedings, vol. 48, 2016, JMLR.org), 478-487
[79] Xu, Jingyi; Zhang, Zilu; Friedman, Tal; Liang, Yitao; Broeck, Guy, A semantic loss function for deep learning with symbolic knowledge, (ICML, 2018), 5502-5511
[80] Zhang, Lintao; Malik, Sharad, The quest for efficient boolean satisfiability solvers, (Brinksma, Ed; Larsen, Kim Guldstrand, Computer Aided Verification, 14th International Conference, CAV 2002, Copenhagen, Denmark, July 27-31, 2002, Proceedings. Computer Aided Verification, 14th International Conference, CAV 2002, Copenhagen, Denmark, July 27-31, 2002, Proceedings, Lecture Notes in Computer Science, vol. 2404, 2002, Springer), 17-36 · Zbl 1010.68530
[81] Zhang, Hongjing; Zhan, Tianyang; Basu, Sugato; Davidson, Ian, A framework for deep constrained clustering, Data Min. Knowl. Discov., 35, 2, 593-620, 2021 · Zbl 1473.68182
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.