×

Improvements on SCORE, especially for weak signals. (English) Zbl 1490.62158

Summary: A network may have weak signals and severe degree heterogeneity, and may be very sparse in one occurrence but very dense in another. SCORE [J. Jin, Ann. Stat. 43, No. 1, 57–89 (2015; Zbl 1310.62076)] is a recent approach to network community detection. It accommodates severe degree heterogeneity and is adaptive to different levels of sparsity, but its performance for networks with weak signals is unclear. In this paper, we show that in a broad class of network settings where we allow for weak signals, severe degree heterogeneity, and a wide range of network sparsity, SCORE achieves prefect clustering and has the so-called “exponential rate” in Hamming clustering errors. The proof uses the most recent advancement on entry-wise bounds for the leading eigenvectors of the network adjacency matrix. The theoretical analysis assures us that SCORE continues to work well in the weak signal settings, but it does not rule out the possibility that SCORE may be further improved to have better performance in real applications, especially for networks with weak signals. As a second contribution of the paper, we propose SCORE+ as an improved version of SCORE. We investigate SCORE+ with 8 network data sets and found that it outperforms several representative approaches. In particular, for the 6 data sets with relatively strong signals, SCORE+ has similar performance as that of SCORE, but for the 2 data sets (Simmons, Caltech) with possibly weak signals, SCORE+ has much lower error rates. SCORE+ proposes several changes to SCORE. We carefully explain the rationale underlying each of these changes, using a mixture of theoretical and numerical study.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
91C20 Clustering in the social and behavioral sciences
62P25 Applications of statistics to social sciences

Citations:

Zbl 1310.62076

References:

[1] Abbe, E., Fan, J., Wang, K. and Zhong, Y. (2019). Entrywise eigenvector analysis of random matrices with low expected rank. Ann. Statist. (to appear).
[2] Adamic, L A and Glance, N (2005). The political blogosphere and the 2004 US election: divided they blog. In Proceedings of the 3rd international workshop on Link discovery, pp. 36-43.
[3] Airoldi, E.; Blei, D.; Fienberg, S.; Xing, E., Mixed membership stochastic blockmodels, J. Mach. Learn. Res., 9, 1981-2014 (2008) · Zbl 1225.68143
[4] Bickel, P. J. and Chen, A (2009).
[5] Chaudhuri, K., Chung, F. and Tsiatas, A. (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. In Proceedings of the 25th annual conference on learning theory, JMLR workshop and conference proceedings, vol. 23, pp. 1-35.
[6] Chen, Y.; Li, X.; Xu, J., Convexified modularity maximization for degree-corrected stochastic block models, Ann. Statist., 46, 1573-1602 (2018) · Zbl 1410.62105
[7] Duan, Y., Ke, Z. T. and Wang, M. (2018). State aggregation learning from Markov transition data. In NIPS workshop on probabilistic reinforcement learning and structured control.
[8] Fan, J., Fan, Y., Han, X. and Lv, J. (2019). SIMPLE: statistical inference on membership profiles in large networks. arXiv:1910.01734.
[9] Fisher, RA, The use of multiple measurements in taxonomic problems, Ann. Eugen., 7, 179-188 (1936) · doi:10.1111/j.1469-1809.1936.tb02137.x
[10] Gao, C.; Ma, Z.; Zhang, AY; Zhou, HH, Community detection in degree-corrected block models, Ann. Statist., 46, 2153-2185 (2018) · Zbl 1408.62116
[11] Girvan, M.; Newman, MEJ, Community structure in social and biological networks, Proc. Natl. Acad. Sci., 99, 12, 7821-7826 (2002) · Zbl 1032.91716 · doi:10.1073/pnas.122653799
[12] Hastie, T.; Tibshirani, R.; Friedman, J., The elements of statistical learning, 2nd edn (2009), Berlin: Springer, Berlin · Zbl 1273.62005 · doi:10.1007/978-0-387-84858-7
[13] Ji, P.; Jin, J., Coauthorship and citation networks for statisticians (with discussion)., Ann. Appl. Statist., 10, 4, 1779-1812 (2016) · Zbl 1454.62541
[14] Jin, J., Fast community detection by SCORE, Ann. Statist., 43, 57-89 (2015) · Zbl 1310.62076 · doi:10.1214/14-AOS1265
[15] Jin, J. and Ke, Z. T. (2018). Optimal membership estimation, especially for networks with severe degree heterogeneity. Manuscript.
[16] Jin, J., Ke, Z. T. and Luo, S. (2017). Estimating network memberships by simplex vertex hunting. arXiv:1708.07852.
[17] Jin, J., Tracy Ke, Z. and Luo, S. (2019). Optimal adaptivity of signed-polygon statistics for network testing. arXiv:1904.09532.
[18] Jin, J., Ke, Z. T., Luo, S. and Wang, M. (2020). Optimal approach to estimating K in social networks. Manuscript.
[19] Karrer, B.; Newman, M., Stochastic blockmodels and community structure in networks, Phys. Rev. E, 83, 016107 (2011) · doi:10.1103/PhysRevE.83.016107
[20] Ke, Z. T. and Wang, M. (2017). A new SVD approach to optimal topic estimation. arXiv:1704.07016.
[21] Ke, Z. T., Shi, F. and Xia, D. (2020). Community detection for hypergraph networks via regularized tensor power iteration. arXiv:1909.06503.
[22] Lusseau, D.; Schneider, K.; Boisseau, OJ; Haase, P.; Slooten, E.; Dawson, SM, The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations, Behav. Ecol. Sociobiol., 54, 4, 396-405 (2003) · doi:10.1007/s00265-003-0651-y
[23] Liu, Y., Hou, Z., Yao, Z., Bai, Z., Hu, J. and Zheng, S. (2019). Community detection based on the \(\ell_{\infty }\) convergence of eigenvectors in dcbm. arXiv:1906.06713.
[24] Ma, Z.; Ma, Z.; Yuan, H., Universal latent space model fitting for large networks with edge covariates, J. Mach. Learn. Res., 21, 1-67 (2020) · Zbl 1497.68432
[25] Mao, X., Sarkar, P. and Chakrabarti, D. (2020). Estimating mixed memberships with sharp eigenvector deviations. J. Amer. Statist. Assoc. (to appear), 147.
[26] Mihail, M. and Papadimitriou, C. H. (2002). On the eigenvalue power law. In International workshop on randomization and approximation techniques in computer science, pp. 254-262. Springer, Berlin. · Zbl 1028.68569
[27] Nepusz, T.; Petróczi, A.; Négyessy, L.; Bazsó, F., Fuzzy communities and the concept of bridgeness in complex networks, Phys. Rev. E, 77, 1, 016107 (2008) · doi:10.1103/PhysRevE.77.016107
[28] Qin, T. and Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. Adv. Neural Inf. Process. Syst. 3120-3128.
[29] Su, L.; Wang, W.; Zhang, Y., Strong consistency of spectral clustering for stochastic block models, IEEE Trans. Inform. Theory, 66, 324-338 (2019) · Zbl 1433.62170 · doi:10.1109/TIT.2019.2934157
[30] Traud, AL; Kelsic, ED; Mucha, PJ; Porter, MA, Comparing community structure to characteristics in online collegiate social networks, SIAM Rev., 53, 526-543 (2011) · doi:10.1137/080734315
[31] Traud, AL; Mucha, PJ; Porter, MA, Social structure of facebook networks, Physica A, 391, 4165-4180 (2012) · doi:10.1016/j.physa.2011.12.021
[32] Zachary, WW, An information flow model for conflict and fission in small groups, J. Anthropol. Res., 33, 4, 452-473 (1977) · doi:10.1086/jar.33.4.3629752
[33] Zhang, Y.; Levina, E.; Zhu, J., Detecting overlapping communities in networks using spectral methods, SIAM J. Math. Anal., 2, 265-283 (2020) · Zbl 1484.62073
[34] Zhao, Y.; Levina, E.; Zhu, J., Consistency of community detection in networks under degree-corrected stochastic block models, Ann. Statist., 40, 2266-2292 (2012) · Zbl 1257.62095 · doi:10.1214/12-AOS1036
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.