×

Ensemble clustering using factor graph. (English) Zbl 1395.62157

Summary: In this paper, we propose a new ensemble clustering approach termed ensemble clustering using factor graph (ECFG). Compared to the existing approaches, our approach has three main advantages: (1) the cluster number is obtained automatically and need not to be specified in advance; (2) the reliability of each base clustering can be estimated in an unsupervised manner and exploited in the consensus process; (3) our approach is efficient for processing ensembles with large data sizes and large ensemble sizes. In this paper, we introduce the concept of super-object, which serves as a compact and adaptive representation for the ensemble data and significantly facilitates the computation. Through the probabilistic formulation, we cast the ensemble clustering problem into a binary linear programming (BLP) problem. The BLP problem is NP-hard. To solve this optimization problem, we propose an efficient solver based on factor graph. The constrained objective function is represented as a factor graph and the max-product belief propagation is utilized to generate the solution insensitive to initialization and converged to the neighborhood maximum. Extensive experiments are conducted on multiple real-world datasets, which demonstrate the effectiveness and efficiency of our approach against the state-of-the-art approaches.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
Full Text: DOI

References:

[1] Fred, A. L.N.; Jain, A. K., Combining multiple clusterings using evidence accumulation, IEEE Trans. Pattern Anal. Mach. Intell., 27, 6, 835-850, (2005)
[2] Vega-Pons, S.; Correa-Morris, J.; Ruiz-Shulcloper, J., Weighted partition consensus via kernels, Pattern Recognit., 43, 8, 2712-2724, (2010) · Zbl 1207.68324
[3] Iam-On, N.; Boongoen, T.; Garrett, S.; Price, C., A link-based approach to the cluster ensemble problem, IEEE Trans. Pattern Anal. Mach. Intell., 33, 12, 2396-2409, (2011)
[4] Yu, Z.; Wong, H.-S.; You, J.; Yu, G.; Han, G., Hybrid cluster ensemble framework based on the random combination of data transformation operators, Pattern Recognit., 45, 5, 1826-1837, (2012) · Zbl 1233.68198
[5] J. Yi, T. Yang, R. Jin, A.K. Jain, Robust ensemble clustering by matrix completion, in: ICDM, 2012, pp. 1176-1181.
[6] D. Huang, J.-H. Lai, C.-D. Wang, Exploiting the wisdom of crowd: a multi-granularity approach to clustering ensemble, in: IScIDE, 2013, pp. 112-119.
[7] Y. Ren, C. Domeniconi, G. Zhang, G. Yu, Weighted-object ensemble clustering, in: ICDM, 2013, pp. 627-636.
[8] Franek, L.; Jiang, X., Ensemble clustering by means of clustering embedding in vector spaces, Pattern Recognit., 47, 2, 833-842, (2014) · Zbl 1326.62137
[9] Yu, Z.; Li, L.; Gao, Y.; You, J.; Liu, J.; Wong, H.-S.; Han, G., Hybrid clustering solution selection strategy, Pattern Recognit., 47, 10, 3362-3375, (2014)
[10] Huang, D.; Lai, J.-H.; Wang, C.-D., Combining multiple clusterings via crowd agreement estimation and multi-granularity link analysis, Neurocomput., 170, 240-250, (2015)
[11] Vega-Pons, S.; Ruiz-Shulcloper, J., A survey of clustering ensemble algorithms, Int. J. Pattern Recognit. Artif. Intell., 25, 3, 337-372, (2011)
[12] T. Li, C. Ding, Weighted consensus clustering, in: SDM, 2008, pp. 798-809.
[13] Mimaroglu, S.; Aksehirli, E., Diclensdivisive clustering ensemble with automatic cluster number, IEEE/ACM Trans. Comput. Biol. Bioinform., 9, 2, 408-420, (2012)
[14] Mimaroglu, S.; Erdil, E., An efficient and scalable family of algorithms for combining clusterings, Eng. Appl. Artif. Intell., 26, 10, 2525-2539, (2013)
[15] Kschischang, F. R.; Frey, B. J.; Loeliger, H.-A., Factor graphs and the sum-product algorithm, IEEE Trans. Inf. Theory, 47, 2, 498-519, (2001) · Zbl 0998.68234
[16] Frey, B. J.; Dueck, D., Clustering by passing messages between data points, Science, 315, 972-976, (2007) · Zbl 1226.94027
[17] Shang, F.; Jiao, L. C.; Shi, J.; Wang, F.; Gong, M., Fast affinity propagation clusteringa multilevel approach, Pattern Recognit., 45, 1, 474-486, (2012)
[18] Zeng, J.; Cheung, W. K.; Liu, J., Learning topic models by belief propagation, IEEE Trans. Pattern Anal. Mach. Intell., 35, 5, 1121-1134, (2013)
[19] Wang, C.-D.; Lai, J.-H.; Suen, C. Y.; Zhu, J.-Y., Multi-exemplar affinity propagation, IEEE Trans. Pattern Anal. Mach. Intell., 35, 9, 2223-2237, (2013)
[20] X. Z. Fern, C. E. Brodley, Solving cluster ensemble problems by bipartite graph partitioning, in: ICML, 2004, pp. 36-43.
[21] Wang, X.; Yang, C.; Zhou, J., Clustering aggregation by probability accumulation, Pattern Recognit., 42, 5, 668-675, (2009) · Zbl 1162.68636
[22] Cristofor, D.; Simovici, D., Finding Median partitions using information-theoretical-based genetic algorithms, J. Univers. Comput. Sci., 8, 2, 153-172, (2002) · Zbl 1258.68048
[23] Strehl, A.; Ghosh, J., Cluster ensemblesa knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res., 3, 583-617, (2003) · Zbl 1084.68759
[24] V. Singh, L. Mukherjee, J. Peng, J. Xu, Ensemble clustering using semidefinite programming, in: NIPS, 2007, pp. 1353-1360.
[25] Topchy, A.; Jain, A. K.; Punch, W., Clustering ensemblesmodels of consensus and weak partitions, IEEE Trans. Pattern Anal. Mach. Intell., 27, 12, 1866-1881, (2005)
[26] Weiszfeld, E.; Plastria, F., On the point for which the sum of the distances to n given points is minimum, Ann. Oper. Res., 167, 1, 7-41, (2009) · Zbl 1176.90616
[27] Mimaroglu, S.; Erdil, E., Combining multiple clusterings using similarity graph, Pattern Recognit., 44, 3, 694-703, (2011) · Zbl 1209.68474
[28] Alush, A.; Goldberger, J., Ensemble segmentation using efficient integer linear programming, IEEE Trans. Pattern Anal. Mach. Intell., 34, 10, 1966-1977, (2012)
[29] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc. Ser. B, 39, 1, 1-38, (1977) · Zbl 0364.62022
[30] Papadimitriou, C. H.; Steiglitz, K., Combinatorial optimization: algorithms and complexity, (1998), Dover Publications New York · Zbl 0944.90066
[31] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P., Gradient-based learning applied to document recognition, Proc. IEEE, 86, 11, 2278-2324, (1998)
[32] K. Bache, M. Lichman, UCI machine learning repository. Available online 〈http://archive.ics.uci.edu/ml〉, 2013.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.