Document Zbl 1260.68356

Günnemann, Stephan; Boden, Brigitte; Seidl, Thomas

Finding density-based subspace clusters in graphs with feature vectors. (English) Zbl 1260.68356

Data Min. Knowl. Discov. 25, No. 2, 243-269 (2012).

Summary: Data sources representing attribute information in combination with network information are widely available in today’s applications. To realize the full potential for knowledge extraction, mining techniques like clustering should consider both information types simultaneously. Recent clustering approaches combine subspace clustering with dense subgraph mining to identify groups of objects that are similar in subsets of their attributes as well as densely connected within the network. While those approaches successfully circumvent the problem of full-space clustering, their limited cluster definitions are restricted to clusters of certain shapes. In this work we introduce a density-based cluster definition, which takes into account the attribute similarity in subspaces as well as a local graph density and enables us to detect clusters of arbitrary shape and size. Furthermore, we avoid redundancy in the result by selecting only the most interesting non-redundant clusters. Based on this model, we introduce the clustering algorithm DB-CSC, which uses a fixed point iteration method to efficiently determine the clustering solution. We prove the correctness and complexity of this fixed point iteration analytically. In thorough experiments we demonstrate the strength of DB-CSC in comparison to related approaches.

Cited in 1 Document

MSC:

68T10	Pattern recognition, speech recognition
68R10	Graph theory (including graph drawing) in computer science

Keywords:

graph clustering; dense subgraphs; networks

Software:

SA-cluster; Inc-cluster

Cite Review PDF

Full Text: DOI

References:

[1]	Aggarwal C, Wang H (2010) Managing and mining graph data. Springer, New York · Zbl 1185.68458
[2]	Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. In: SIGMOD, pp 94–105. SIGMOD, Seattle
[3]	Assent I, Krieger R, Müller E, Seidl T (2008) EDSC: efficient density-based subspace clustering. In: CIKM, pp 1093–1102. CIKM, Glasgow
[4]	Beyer KS, Goldstein J, Ramakrishnan R, Shaft U (1999) When is ”nearest neighbor” meaningful? In: ICDT, pp 217–235. ICDT, Mont Blanc
[5]	Dorogovtsev S, Goltsev A, Mendes J (2006) K-core organization of complex networks. Phys Rev Lett 96(4): 40–601 · Zbl 1130.94024
[6]	Du N, Wu B, Pei X, Wang B, Xu L (2007) Community detection in large-scale social networks. In: WebKDD/SNA-KDD, pp 16–25. SNA-KDD, San Jose
[7]	Ester M, Kriegel HP, S J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, pp 226–231. KDD, Portland
[8]	Ester M, Ge R, Gao BJ, Hu Z, Ben-Moshe B (2006) Joint cluster analysis of attribute data and relationship data: the connected k-center problem. In: SDM. SDM, Bethesda
[9]	Günnemann S, Müller E, Färber I, Seidl T (2009) Detection of orthogonal concepts in subspaces of high dimensional data. In: CIKM, pp 1317–1326. CIKM, Hong Kong
[10]	Günnemann S, Färber I, Boden B, Seidl T (2010) Subspace clustering meets dense subgraph mining: a synthesis of two paradigms. In: ICDM, pp 845–850. ICDM, Sydney
[11]	Günnemann S, Kremer H, Seidl T (2010) Subspace clustering for uncertain data. In: SDM, pp 385–396. SDM, Bethesda
[12]	Günnemann S, Boden B, Seidl T (2011) DB-CSC: A density-based approach for subspace clustering in graphs with feature vectors. In: ECML/PKDD (1), pp 565–580. ECML, Athens
[13]	Günnemann S, Färber I, Müller E, Assent I, Seidl T (2011) External evaluation measures for subspace clustering. In: CIKM, pp 1363–1372. CIKM, Glasgow
[14]	Hanisch D, Zien A, Zimmer R, Lengauer T (2002) Co-clustering of biological networks and gene expression data. Bioinformatics 18: 145–154 · doi:10.1093/bioinformatics/18.suppl_1.S145
[15]	Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. In: KDD, pp 58–65. KDD, New York
[16]	Janson S, Luczak M (2007) A simple solution to the k-core problem. Rand Struct Algorithm 30(1–2): 50–62 · Zbl 1113.05091 · doi:10.1002/rsa.20147
[17]	Kailing K, Kriegel HP, Kroeger P (2004) Density-connected subspace clustering for high-dimensional data. In: SDM, pp 246–257. SDM, Bethesda
[18]	Kriegel HP, Kröger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. Trans Knowl Discov Data 3(1): 1–58 · doi:10.1145/1497577.1497578
[19]	Kubica J, Moore AW, Schneider JG (2003) Tractable group detection on large link data sets. In: ICDM, pp 573–576. ICDM, Sydney
[20]	Long B, Wu X, Zhang ZM, Yu PS (2006) Unsupervised learning on k-partite graphs. In: KDD, pp 317–326. KDD, Portland
[21]	Long B, Zhang ZM, Yu PS (2007) A probabilistic framework for relational clustering. In: KDD, pp 470–479. KDD, Portland
[22]	Moise G, Sander J (2008) Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering. In: KDD, pp 533–541. KDD, Portland
[23]	Moser F, Colak R, Rafiey A, Ester M (2009) Mining cohesive patterns from graphs with feature vectors. In: SDM, pp 593–604. SDM, Bethesda
[24]	Müller E, Assent I, Günnemann S, Krieger R, Seidl T (2009) Relevant subspace clustering: mining the most interesting non-redundant concepts in high dimensional data. In: ICDM, pp 377–386. ICDM, Sydney
[25]	Müller E, Günnemann S, Assent I, Seidl T (2009) Evaluating clustering in subspace projections of high dimensional data. In: VLDB, pp 1270–1281. VLDB, Singapore
[26]	Parsons L, Haque E, Liu H (2004) Subspace clustering for high dimensional data: a review. SIGKDD Explor 6(1): 90–105 · doi:10.1145/1007730.1007731
[27]	Pei J, Jiang D, Zhang A (2005) On mining cross-graph quasi-cliques. In: KDD, pp 228–238. KDD, Portland
[28]	Ruan J, Zhang W (2007) An efficient spectral algorithm for network community discovery and its applications to biological and social networks. In: ICDM, pp 643–648. ICDM, Sydney
[29]	Ulitsky I, Shamir R (2007) Identification of functional modules using network topology and high-throughput data. BMC Syst Biol 1(1): 8 · doi:10.1186/1752-0509-1-8
[30]	Zhou Y, Cheng H, Yu JX (2009) Graph clustering based on structural/attribute similarities. PVLDB 2(1): 718–729
[31]	Zhou Y, Cheng H, Yu JX (2010) Clustering large attributed graphs: an efficient incremental approach. In: ICDM, pp 689–698. ICDM, Sydney

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.