×

A consistent adjacency spectral embedding for stochastic blockmodel graphs. (English) Zbl 1443.62188

Summary: We present a method to estimate block membership of nodes in a random graph generated by a stochastic blockmodel. We use an embedding procedure motivated by the random dot product graph model, a particular example of the latent position model. The embedding associates each node with a vector; these vectors are clustered via minimization of a square error criterion. We prove that this method is consistent for assigning nodes to blocks, as only a negligible number of nodes will be misassigned. We prove consistency of the method for directed and undirected graphs. The consistent block assignment makes possible consistent parameter estimation for a stochastic blockmodel. We extend the result in the setting where the number of blocks grows slowly with the number of nodes. Our method is also computationally feasible even for very large graphs. We compare our method with Laplacian spectral clustering through analysis of simulated data and a graph derived from Wikipedia documents.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
05C80 Random graphs (graph-theoretic aspects)

References:

[1] Airoldi E. M., The Journal of Machine Learning Research 9 pp 1981– (2008)
[2] Bickel P. J., Proceedings of the National Academy of Sciences of the United States of America 106 (50) pp 21068– (2009) · Zbl 1359.62411 · doi:10.1073/pnas.0907096106
[3] Choi D. S., Biometrika 99 pp 273– (2010) · Zbl 1318.62207 · doi:10.1093/biomet/asr053
[4] Condon A., Random Structures and Algorithms 18 (2) pp 116– (2001) · Zbl 0972.68129 · doi:10.1002/1098-2418(200103)18:2<116::AID-RSA1001>3.0.CO;2-2
[5] Davis C., SIAM Journal on Numerical Analysis 7 pp 1– (1970) · Zbl 0198.47201 · doi:10.1137/0707001
[6] Eckart C., Psychometika 1 (3) pp 211– (1936) · JFM 62.1075.02 · doi:10.1007/BF02288367
[7] Fjallstrom P., Computer and Information Science 3 (10) (1998)
[8] Fortunato S., Physics Reports 486 (3) pp 75– (2010) · doi:10.1016/j.physrep.2009.11.002
[9] Goldenberg A., Foundations and Trends in Machine Learning 2 (2) pp 129– (2009) · Zbl 1184.68030 · doi:10.1561/2200000005
[10] Handcock M. S., Journal of the Royal Statistical Society, Series A 170 (2) pp 301– (2007)
[11] Hoff P., Journal of the American Statistical Association 97 pp 1090– (2002) · Zbl 1041.62098 · doi:10.1198/016214502388618906
[12] Holland P. W., Social Networks 5 (2) pp 109– (1983) · doi:10.1016/0378-8733(83)90021-7
[13] Horn R., Matrix Analysis (1985) · Zbl 0576.15001 · doi:10.1017/CBO9780511810817
[14] Hubert L., Journal of Classification 2 (1) pp 193– (1985) · doi:10.1007/BF01908075
[15] Marchette D., Proceedings of the 57th ISI World Statistics Congress (2011)
[16] McSherry F., Proceedings 2001 IEEE International Conference on Cluster Computing pp 529– (2001)
[17] Newman M., Physical Review 69 (2) pp 1– (2004)
[18] Nowicki K., Journal of the American Statistical Association 96 (455) pp 1077– (2001) · Zbl 1072.62542 · doi:10.1198/016214501753208735
[19] Qiu L., SIAM Journal on Matrix Analysis and Applications 27 (2) (2005)
[20] Scheinerman E., Computational Statistics 25 (1) pp 1– (2010) · Zbl 1223.62109 · doi:10.1007/s00180-009-0158-8
[21] Snijders T., Journal of Classification 14 pp 75– (1997) · Zbl 0896.62063 · doi:10.1007/s003579900004
[22] Wang Y. J., Journal of the American Statistical Association 82 (397) pp 8– (1987) · doi:10.1080/01621459.1987.10478385
[23] Young S., Proceedings of the 5th International Conference on Algorithms and Models for the Web-Graph pp 138– (2007) · Zbl 1136.05322 · doi:10.1007/978-3-540-77004-6_11
[24] Zhang F., IEEE Transactions on Automatic Control 51 (9) pp 1506– (2006) · Zbl 1366.15014 · doi:10.1109/TAC.2006.880787
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.