×

Probabilistic community detection with unknown number of communities. (English) Zbl 1420.62271

Summary: A fundamental problem in network analysis is clustering the nodes into groups which share a similar connectivity pattern. Existing algorithms for community detection assume the knowledge of the number of clusters or estimate it a priori using various selection criteria and subsequently estimate the community structure. Ignoring the uncertainty in the first stage may lead to erroneous clustering, particularly when the community structure is vague. We instead propose a coherent probabilistic framework for simultaneous estimation of the number of communities and the community structure, adapting recently developed Bayesian nonparametric techniques to network models. An efficient Markov chain Monte Carlo (MCMC) algorithm is proposed which obviates the need to perform reversible jump MCMC on the number of clusters. The methodology is shown to outperform recently developed community detection algorithms in a variety of synthetic data examples and in benchmark real-datasets. Using an appropriate metric on the space of all configurations, we develop nonasymptotic Bayes risk bounds even when the number of clusters is unknown. Enroute, we develop concentration properties of nonlinear functions of Bernoulli random variables, which may be of independent interest in analysis of related models.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
62F15 Bayesian inference
62G10 Nonparametric hypothesis testing
91C20 Clustering in the social and behavioral sciences

References:

[1] Abbe, E.; Sandon, C., Community Detection in the General Stochastic Block Model: Fundamental Limits and Efficient Algorithms for Recovery,, Proceedings of 56th Annual IEEE Symposium on Foundations of Computer Science, Berkeley, CA, 670-688 (2015)
[2] ———, Detection in the Stochastic Block Model with Multiple Clusters: Proof of the Achievability Conjectures, Acyclic BP, and the Information-Computation Gap, (2015)
[3] ———, Recovering Communities in the General Stochastic Block Model without knowing the Parameters,, Advances in Neural Information Processing Systems, 676-684 (2015), Red Hook, NY: Curran Associates, Inc., Red Hook, NY
[4] Airoldi, E. M.; Blei, D. M.; Fienberg, S. E.; Xing, E. P., Mixed Membership Stochastic Blockmodels,, Journal of Machine Learning Research, 9, 1981-2014 (2008) · Zbl 1225.68143
[5] Aldous, D. J., Exchangeability and Related Topics (1985), New York: Springer, New York · Zbl 0562.60042
[6] Amini, A. A.; Chen, A.; Bickel, P. J.; Levina, E., Pseudo-Likelihood Methods for Community Detection in Large Sparse Networks, The Annals of Statistics, 41, 2097-2122 (2013) · Zbl 1277.62166
[7] Bickel, P.; Chen, A., A Nonparametric View of Network Models and Newman-Girvan and Other Modularities, Proceedings of the National Academy of Sciences, 106, 21068-21073 (2009) · Zbl 1359.62411
[8] Blackwell, D.; Macqueen, J. B., Ferguson Distributions via Pólya urn Schemes, The Annals of Statistics, 1, 353-355 (1973) · Zbl 0276.62010
[9] Blondel, V. D.; Guillaume, J.-L.; Lambiotte, R.; Lefebvre, E., Fast Unfolding of Communities in Large Networks, Journal of Statistical Mechanics: Theory and Experiment, 2008, P10008 (2008) · Zbl 1459.91130
[10] Castillo, I.; Schmidt-Hieber, J.; Van Der Vaart, A. W., Bayesian Linear Regression with Sparse Priors, The Annals of Statistics, 43, 1986-2018 (2015) · Zbl 1486.62197
[11] Dahl, D. B., Modal Clustering in a Class of Product Partition Models, Bayesian Analysis, 4, 243-264 (2009) · Zbl 1330.62248
[12] Daudin, J. J.; Picard, F.; Robin, S., A Mixture Model for Random Graphs, Statistics and Computing, 18, 173-183 (2008)
[13] Drton, M.; Plummer, M., A Bayesian Information Criterion for Singular Models,, Journal of the Royal Statistical Society, 79, 323-380 (2017) · Zbl 1414.62088
[14] Gao, C.; Ma, Z.; Zhang, A. Y.; Zhou, H. H., Achieving Optimal Misclassification Proportion in Stochastic Block Model,, The Journal of Machine Learning Research, 18, 1980-2024 (2017)
[15] Goldenberg, A.; Zheng, A.; Fienberg, S.; Airoldi, E., A Survey of Statistical Network Models, Foundations and Trends® in Machine Learning, 2, 129-233 (2010) · Zbl 1184.68030
[16] Goldenberg, J.; Libai, B.; Muller, E., Talk of the Network: A Complex Systems look at the Underlying Process of Word-Of-Mouth, Marketing Letters, 12, 211-223 (2001)
[17] Green, P. J., Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination, Biometrika, 82, 711-732 (1995) · Zbl 0861.62023
[18] Holland, P. W.; Laskey, K. B.; Leinhardt, S., Stochastic Blockmodels: First Steps, Social Networks, 5, 109-137 (1983)
[19] Johnson, V. E.; Rossell, D., Bayesian Model Selection in High-Dimensional Settings, Journal of the American Statistical Association, 107, 649-660 (2012) · Zbl 1261.62024
[20] Karrer, B.; Newman, M. E. J., Stochastic Blockmodels and Community Structure in Networks, Physical Review E, 83, 016107 (2011)
[21] Kruijer, W.; Rousseau, J.; Van Der Vaart, A., Adaptive Bayesian Density Estimation with Location-Scale Mixtures, Electronic Journal of Statistics, 4, 1225-1257 (2010) · Zbl 1329.62188
[22] Latouche, P.; Birmele, E.; Ambroise, C., Variational Bayesian Inference and Complexity Control for Stochastic Block Models, Statistical Modelling, 12, 93-115 (2012) · Zbl 1420.62114
[23] Le, C. M.; Levina, E., Estimating the Number of Communities in Networks by Spectral Methods, (2015)
[24] Lusseau, D.; Newman, M., Identifying the Role that Animals Play in their Social Networks, Proceedings of the Royal Society of London B: Biological Sciences, 271, S477-S481 (2004)
[25] Lusseau, D.; Schneider, K.; Boisseau, O.; Haase, P.; Slooten, E.; Dawson, S., The Bottlenose Dolphin Community of Doubtful Sound features a Large Proportion of Long-Lasting Associations, Behavioral Ecology and Sociobiology, 54, 396-405 (2003)
[26] Mcdaid, A.; Murphy, T. B.; Friel, N.; Hurley, N., Improved Bayesian inference for the Stochastic Block Model with Application to Large Networks, Computational Statistics & Data Analysis, 60, 12-31 (2013) · Zbl 1365.62241
[27] Miller, J. W., Nonparametric and Variable-Dimension Bayesian Mixture Models: Analysis, Comparison, and New Methods, (2014)
[28] Miller, J. W.; Harrison, M. T., Mixture Models with a Prior on the Number of Components,, Journal of the American Statistical Association, 113, 340-356 (2017) · Zbl 1398.62066
[29] Narisetty, N. N.; He, X., Bayesian Variable Selection with Shrinking and Diffusing Priors, The Annals of Statistics, 42, 789-817 (2014) · Zbl 1302.62158
[30] Neal, R. M., Markov Chain Sampling Methods for Dirichlet Process Mixture Models, Journal of Computational and Graphical Statistics, 9, 249-265 (2000)
[31] Newman, M. E., Detecting Community Structure in Networks, The European Physical Journal B-Condensed Matter and Complex Systems, 38, 321-330 (2004)
[32] ———, Finding Community Structure in Networks using the Eigenvectors of Matrices, Physical review E, 74, 036104 (2006)
[33] Newman, M. E. J., Communities, Modules and Large-Scale Structure in Networks, Nature Physics, 8, 25-31 (2012)
[34] Newman, M. E.; Girvan, M., Finding and Evaluating Community Structure in Networks, Physical Review E, 69, 026113 (2004)
[35] Newman, M.; Reinert, G., Estimating the Number of Communities in a Network, (2016)
[36] Nobile, A.; Fearnside, A. T., Bayesian Finite Mixtures with an Unknown Number of Components: The Allocation Sampler, Statistics and Computing, 17, 147-162 (2007)
[37] Nowicki, K.; Snijders, T. A. B., Estimation and Prediction for Stochastic Blockstructures, Journal of the American Statistical Association, 96, 1077-1087 (2001) · Zbl 1072.62542
[38] Pitman, J., Exchangeable and Partially Exchangeable Random Partitions, Probability Theory and Related Fields, 102, 145-158 (1995) · Zbl 0821.60047
[39] Rand, W. M., Objective Criteria for the Evaluation of Clustering Methods, Journal of the American Statistical Association, 66, 846-850 (1971)
[40] Rohe, K.; Chatterjee, S.; Yu, B., Spectral Clustering and the High-Dimensional Stochastic Blockmodel, The Annals of Statistics, 39, 1878-1915 (2011) · Zbl 1227.62042
[41] Rousseau, J.; Mengersen, K., Asymptotic Behaviour of the Posterior Distribution in Overfitted Mixture Models, (2011) · Zbl 1228.62034
[42] Saldana, D. F.; Yu, Y.; Feng, Y., How Many Communities Are There?, Journal of Computational and Graphical Statistics, 26, 171-181 (2015)
[43] Sethuraman, J., A Constructive Definition of Dirichlet Priors, Statistica Sinica, 4, 639-650 (1994) · Zbl 0823.62007
[44] Shi, J.; Malik, J., Normalized Cuts and Image Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888-905 (2000)
[45] Shin, M.; Bhattacharya, A.; Johnson, V. E., Scalable Bayesian Variable Selection Using Nonlocal Prior Densities in Ultrahigh-Dimensional Settings, Statistica Sinica, 28, 1053-1078 (2018) · Zbl 1390.62125
[46] Snijders, T. A. B.; Nowicki, K., Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure, Journal of classification, 14, 75-100 (1997) · Zbl 0896.62063
[47] Vershynin, R., Introduction to the Non-Asymptotic Analysis of Random Matrices,, Compressed Sensing: Theory and Applications (2012), Cambridge, UK: Cambridge University Press, Cambridge, UK
[48] Wang, Y. X.; Bickel, P. J., Likelihood-Based Model Selection for Stochastic Block Models,, 45, 500-528 (2017) · Zbl 1371.62017
[49] White, S.; Smyth, P., A Spectral Clustering Approach To Finding Communities in Graph,, SDM, 5, 76-84 (2005)
[50] Zanghi, H.; Ambroise, C.; Miele, V., Fast Online Graph Clustering via Erdős-Rényi mixture, Pattern Recognition, 41, 3592-3599 (2008) · Zbl 1151.68623
[51] Zhang, A. Y.; Zhou, H. H., Minimax Rates of Community Detection in Stochastic Block Models, The Annals of Statistics, 44, 2252-2280 (2016) · Zbl 1355.60125
[52] Zhang, S.; Wang, R.-S.; Zhang, X.-S., Identification of Overlapping Community Structure in Complex Networks using fuzzy c-Means Clustering, Physica A: Statistical Mechanics and its Applications, 374, 483-490 (2007)
[53] Zhao, Y.; Levina, E.; Zhu, J., Community Extraction for Social Networks, Proceedings of the National Academy of Sciences, 108, 7321-7326 (2011)
[54] ———, Consistency of Community Detection in Networks Under Degree-Corrected Stochastic Block Models, The Annals of Statistics, 40, 2266-2292 (2012) · Zbl 1257.62095
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.