×

A goodness-of-fit test for stochastic block models. (English) Zbl 1331.62283

Summary: The stochastic block model is a popular tool for studying community structures in network data. We develop a goodness-of-fit test for the stochastic block model. The test statistic is based on the largest singular value of a residual matrix obtained by subtracting the estimated block mean effect from the adjacency matrix. Asymptotic null distribution is obtained using recent advances in random matrix theory. The test is proved to have full power against alternative models with finer structures. These results naturally lead to a consistent sequential testing estimate of the number of communities.

MSC:

62H15 Hypothesis testing in multivariate analysis

References:

[1] Abbe, E., Bandeira, A. S. and Hall, G. (2014). Exact recovery in the stochastic block model. Preprint. Available at . arXiv:1405.3267 · Zbl 1359.94047 · doi:10.1109/TIT.2015.2490670
[2] Adamic, L. A. and Glance, N. (2005). The political blogosphere and the 2004 US election: Divided they blog. In Proceedings of the 3 rd International Workshop on Link Discovery 36-43. ACM, New York.
[3] Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981-2014. · Zbl 1225.68143
[4] Amini, A. A. and Levina, E. (2014). On semidefinite relaxations for the block model. Preprint. Available at . arXiv:1406.5647
[5] Anandkumar, A., Ge, R., Hsu, D. and Kakade, S. M. (2014). A tensor approach to learning mixed membership community models. J. Mach. Learn. Res. 15 2239-2312. · Zbl 1318.68136
[6] Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068-21073. · Zbl 1194.62125 · doi:10.1214/10-AOAS361
[7] Bickel, P. J. and Sarkar, P. (2013). Hypothesis testing for automated community detection in networks. Preprint. Available at . arXiv:1311.2694
[8] Bloemendal, A., Erdős, L., Knowles, A., Yau, H.-T. and Yin, J. (2014). Isotropic local laws for sample covariance and generalized Wigner matrices. Electron. J. Probab. 19 1-53. · Zbl 1288.15044 · doi:10.1214/EJP.v19-3054
[9] Chatterjee, S. (2015). Matrix estimation by universal singular value thresholding. Ann. Statist. 43 177-214. · Zbl 1308.62038 · doi:10.1214/14-AOS1272
[10] Chaudhuri, K., Chung, F. and Tsiatas, A. (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. J. Mach. Learn. Res. Workshop Conf. Proc. 2012 35.1-35.23.
[11] Chen, K. and Lei, J. (2014). Network cross-validation for determining the number of communities in network data. Preprint. Available at . arXiv:1411.1715
[12] Chen, Y., Sanghavi, S. and Xu, H. (2012). Clustering sparse graphs. In Advances in Neural Information Processing Systems 25 (F. Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger, eds.) 2204-2212. Curran Associates, Red Hook, NY.
[13] Decelle, A., Krzakala, F., Moore, C. and Zdeborová, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E (3) 84 066106.
[14] Erdős, L., Yau, H.-T. and Yin, J. (2012). Rigidity of eigenvalues of generalized Wigner matrices. Adv. Math. 229 1435-1515. · Zbl 1238.15017 · doi:10.1016/j.aim.2011.12.010
[15] Erdős, L., Knowles, A., Yau, H.-T. and Yin, J. (2012). Spectral statistics of Erdős-Rényi Graphs II: Eigenvalue spacing and the extreme eigenvalues. Comm. Math. Phys. 314 587-640. · Zbl 1251.05162 · doi:10.1007/s00220-012-1527-7
[16] Erdős, L., Knowles, A., Yau, H.-T. and Yin, J. (2013a). The local semicircle law for a general class of random matrices. Electron. J. Probab. 18 no. 59, 58. · Zbl 1373.15053 · doi:10.1214/EJP.v18-2473
[17] Erdős, L., Knowles, A., Yau, H.-T. and Yin, J. (2013b). Spectral statistics of Erdős-Rényi graphs I: Local semicircle law. Ann. Probab. 41 2279-2375. · Zbl 1272.05111 · doi:10.1214/11-AOP734
[18] Fishkind, D. E., Sussman, D. L., Tang, M., Vogelstein, J. T. and Priebe, C. E. (2013). Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown. SIAM J. Matrix Anal. Appl. 34 23-39. · Zbl 1314.05186 · doi:10.1137/120875600
[19] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw. 5 109-137. · doi:10.1016/0378-8733(83)90021-7
[20] Jin, J. (2012). Fast community detection by SCORE. Available at . arXiv:1211.5803 · Zbl 1310.62076 · doi:10.1214/14-AOS1265
[21] Kargin, V. (2014). On the singular values of the reduced-rank multivariate response regression. Preprint. Available at . arXiv:1409.6779
[22] Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107, 10. · doi:10.1103/PhysRevE.83.016107
[23] Krzakala, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborová, L. and Zhang, P. (2013). Spectral redemption in clustering sparse networks. Proc. Natl. Acad. Sci. USA 110 20935-20940. · Zbl 1359.62252 · doi:10.1073/pnas.1312486110
[24] Lee, J. O. and Yin, J. (2014). A necessary and sufficient condition for edge universality of Wigner matrices. Duke Math. J. 163 117-173. · Zbl 1296.60007 · doi:10.1215/00127094-2414767
[25] Lei, J. and Rinaldo, A. (2013). Consistency of spectral clustering in stochastic block models. Preprint. Available at . arXiv:1312.2050 · Zbl 1308.62041 · doi:10.1214/14-AOS1274
[26] Lei, J. and Zhu, L. (2014). A generic sample splitting approach for refined community recovery in stochastic block models. Preprint. Available at . arXiv:1411.1469
[27] Massoulie, L. (2013). Community detection thresholds and the weak Ramanujan property. Preprint. Available at . arXiv:1311.3085
[28] McSherry, F. (2001). Spectral partitioning of random graphs. In 42 nd IEEE Symposium on Foundations of Computer Science ( Las Vegas , NV , 2001) 529-537. IEEE Computer Soc., Los Alamitos, CA.
[29] Montanari, A., Reichman, D. and Zeitouni, O. (2014). On the limitation of spectral methods: From the Gaussian hidden clique problem to rank one perturbations of Gaussian tensors. Preprint. Available at . arXiv:1411.6149 · Zbl 1366.94150
[30] Mossel, E., Neeman, J. and Sly, A. (2013). A proof of the block model threshold conjecture. Preprint. Available at . arXiv:1311.4115
[31] Newman, M. E. (2006). Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103 8577-8582.
[32] Newman, M. E. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E (3) 69 026113. · Zbl 1032.91716
[33] Saldana, D. F., Yi, Y. and Feng, Y. (2014). How many communities are there? Preprint. Available at . arXiv:1412.1684
[34] Vu, V. (2014). A simple SVD algorithm for finding hidden partitions. Preprint. Available at . arXiv:1404.3918 · Zbl 1310.62076
[35] Wolfe, P. J. and Olhede, S. C. (2013). Nonparametric graphon estimation. Preprint. Available at . arXiv:1309.5936
[36] Yan, X., Shalizi, C., Jensen, J. E., Krzakala, F., Moore, C., Zdeborová, L., Zhang, P. and Zhu, Y. (2014). Model selection for degree-corrected block models. J. Stat. Mech. Theory Exp. 2014 P05007.
[37] Zhao, Y., Levina, E. and Zhu, J. (2011). Community extraction for social networks. Proc. Natl. Acad. Sci. USA 108 7321-7326.
[38] Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40 2266-2292. · Zbl 1257.62095 · doi:10.1214/12-AOS1036
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.