Skip to main content
Log in

Fast affinity propagation clustering based on incomplete similarity matrix

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Affinity propagation (AP) is a recently proposed clustering algorithm, which has been successful used in a lot of practical problems. Although effective in finding meaningful clustering solutions, a key disadvantage of AP is its efficiency, which has become the bottleneck when applying AP for large-scale problems. In the literature, most of the methods proposed to improve the efficiency of AP are based on implementing the message-passing on a sparse similarity matrix, while neither the decline in effectiveness nor the improvement in efficiency is theoretically analyzed. In this paper, we propose a two-stage fast affinity propagation (FastAP) algorithm. Different from previous work, the scale of the similarity matrix is first compressed by selecting only potential exemplars, then further reduced by sparseness according to k nearest neighbors. More importantly, we provide theoretical analysis, based on which the improvement of efficiency in our method is controllable with guaranteed clustering performance. In experiments, two synthetic data sets, seven publicly available data sets, and two real-world streaming data sets are used to evaluate the proposed method. The results demonstrate that FastAP can achieve comparable clustering performances with the original AP algorithm, while the computational efficiency has been improved with a several-fold speed-up on small data sets and a dozens-of-fold on larger-scale data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://www.kaggle.com/c/seizure-prediction.

References

  1. Barbakh W, Wu Y, Fyfe C (2009) Review of clustering algorithms. Non-standard parameter adaptation for exploratory data analysis, studies in computational intelligence, vol 249. Springer, Berlin, pp 7–28

    Chapter  Google Scholar 

  2. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

    Article  MATH  Google Scholar 

  3. Borg I, Groenen P (2005) Modern multidimensional scaling: theory and applications, 2nd edn. Springer series in statistics. Springer, Berlin

  4. Chisci L, Mavino A, Perferi G, Sciandrone M, Anile C, Colicchio G, Fuggetta F (2010) Real time epileptic seizure prediction using AR models and support vector machines. IEEE Trans Biomed Eng 57(5):1124–1132

    Article  Google Scholar 

  5. Drezner Z, Hamacher HW (2002) Facility location: applications and theory. Springer, Berlin

    Book  MATH  Google Scholar 

  6. Duda RO, Hart PE, Stork DG (2000) Pattern classification, 2nd edn. Wiley, New York

    MATH  Google Scholar 

  7. Dueck D (2009) Affinity propagation: clustering data by passing messages. Canadian thesis, University of Toronto

  8. EPILEPSIAE (2008) The Freiburg EEG database. http://epilepsy.uni-freiburg.de/freiburg-seizure-prediction-project/eeg-database

  9. Farinelli A, Denitto M, Bicego M (2011) Biclustering of expression microarray data using affinity propagation. In: Proceedings of the 6th international conference on pattern recognition in bioinformatics (PRIB’11). Springer, Berlin, pp 13–24

  10. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976

    Article  MathSciNet  MATH  Google Scholar 

  11. Frey BJ, Dueck D (2008) Response to comment on “clustering by passing messages between data points”. Science 319(5864):726

    Article  Google Scholar 

  12. Fujiwara Y, Irie G, Kitahara T (2011) Fast algorithm for affinity propagation. In: Proceedings of the twenty-second international joint conference on artificial intelligence (IJCAI’11). IJCAI/AAAI, Barcelona, Spain, pp 2238–2243

  13. Givoni IE (2012) Beyond affinity propagation: message passing algorithms for clustering. Canadian thesis, University of Toronto

  14. Givoni IE, Frey BJ (2009) Semi-supervised affinity propagation with instance-level constraints. In: Dyk DAV, Welling M (eds) JMLR proceedings of the 12th international conference on artificial intelligence and statistics. JMLR.org, Clearwater, FL, USA, pp 161–168

  15. Givoni IE, Chung C, Frey BJ (2011) Hierarchical affinity propagation. In: 27th Conference on uncertainty in artificial intelligence (UAI’11), Barcelona, Spain

  16. Guan R, Shi X, Marchese M, Yang C, Liang Y (2011) Text clustering with seeds ffinity propagation. IEEE Trans Knowl Data Eng 23(4):627–637

    Article  Google Scholar 

  17. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, Los Altos

    MATH  Google Scholar 

  18. Jain AK (2010) Data clustering: 50 years beyond k-means. Pattern Recognit Lett 31(8):651–666

    Article  Google Scholar 

  19. Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37

    Article  Google Scholar 

  20. Jia C, Jiang Y, Yu J (2010) Affinity propagation on identifying communities in social and biological networks. Knowledge science, engineering and management, vol 6291., Lecture notes in computer scienceSpringer, Berlin, pp 597–602

    Chapter  Google Scholar 

  21. Jia Y, Wang J, Zhang C, Hua XS (2008) Finding image exemplars using fast sparse affinity propagation. In: Proceedings of the 16th ACM international conference on multimedia (MM ’08). ACM, New York, USA, pp 639–642

  22. Kariv O, Hakimi S (1979) An algorithmic approach to network location problems: the p-centers. SIAM J Appl Math 37(3):513–538

    Article  MathSciNet  MATH  Google Scholar 

  23. Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York

    Book  MATH  Google Scholar 

  24. Kazantseva A, Szpakowicz S (2011) Linear text segmentation using affinity propagation. Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, pp 284–293

    Google Scholar 

  25. Kschischang FR, Frey BJ, Loeliger HA (2001) Factor graphs and the sum-product algorithm. IEEE Trans Inf Theory 47(2):498–519

    Article  MathSciNet  MATH  Google Scholar 

  26. Lai D, Nardini C, Lu H (2011) Partitioning networks into communities by message passing. Phys Rev E 83(016):115

    MathSciNet  Google Scholar 

  27. Leone M, Sumedha Weigt M (2007) Clustering by soft-constraint affinity propagation: applications to gene-expression data. Bioinformatics 23(20):2708–2715

    Article  Google Scholar 

  28. MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1. University of California Press, pp 281–297

  29. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  30. Sun L, Guo C (2014) Incremental affinity propagation clustering based on message passing. IEEE Trans Knowl Data Eng 26(11):2731–2744

    Article  Google Scholar 

  31. Vinh NX, Epps J, Bailey J (2009) Information theoretic measures for clusterings comparison: Is a correction for chance necessary? In: Proceedings of the 26th annual international conference on machine learning (ICML ’09). ACM, pp 1073–1080

  32. Xiao J, Wang J, Tan P, Quan L (2007) Joint affinity propagation for multiple view segmentation. In: IEEE 11th international conference on computer vision (ICCV ’07). IEEE, pp 1–7

  33. Xu R, Wunsch D II (2005) Survey of clustering algorithms. IEEE Trans Neur Netw 16(3):645–678

    Article  Google Scholar 

  34. Zhang K, Kwok JT (2010) Clustered nystrom method for large scale manifold learning and dimension reduction. IEEE Trans Neur Netw 21(10):1576–1587

    Article  Google Scholar 

Download references

Acknowledgments

This work was partly supported by the Natural Science Foundation of China under Grant (Nos. 71171030 and 71501023) and partly by Foundation for Innovative Research Groups of the National Natural Science Foundation of China (No. 71421001).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chonghui Guo.

Appendix

Appendix

See Table 7.

Table 7 Computational experiment on uniformly distributed data sets

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, L., Guo, C., Liu, C. et al. Fast affinity propagation clustering based on incomplete similarity matrix. Knowl Inf Syst 51, 941–963 (2017). https://doi.org/10.1007/s10115-016-0996-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-016-0996-y

Keywords

Navigation