×

Directed community detection with network embedding. (English) Zbl 1515.62020

Summary: Community detection in network data aims at grouping similar nodes sharing certain characteristics together. Most existing methods focus on detecting communities in undirected networks, where similarity between nodes is measured by their node features and whether they are connected. In this article, we propose a novel method to conduct network embedding and community detection simultaneously in a directed network. The network embedding model introduces two sets of vectors to represent the out- and in-nodes separately, and thus allows the same nodes belong to different out- and in-communities. The community detection formulation equips the negative log-likelihood with a novel regularization term to encourage community structure among the nodes representations, and thus achieves better performance by jointly estimating the nodes embeddings and their community structures. To tackle the resultant optimization task, an efficient alternative updating scheme is developed. More importantly, the asymptotic properties of the proposed method are established in terms of both network embedding and community detection, which are also supported by numerical experiments on some simulated and real examples.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H22 Probabilistic graphical models

Software:

MCODE; SNAP; clusfind

References:

[1] Bader, G. D.; Hogue, C. W. V., “An Automated Method for Finding Molecular Complexes in Large Protein Interaction Networks, BMC Bioinformatics, 4, 2 (2003) · doi:10.1186/1471-2105-4-2
[2] Bickel, P. J.; Sarkar, P., “Hypothesis Testing for Automated Community Detection in Networks, Journal of the Royal Statistical Society, Series B, 78, 253-273 (2016) · Zbl 1411.62162 · doi:10.1111/rssb.12117
[3] Chen, K.; Lei, J., “Network Cross-Validation for Determining the Number of Communities in Network Data, Journal of the American Statistical Association, 113, 241-251 (2018) · Zbl 1398.62159 · doi:10.1080/01621459.2016.1246365
[4] Chien, I. E.; Lin, C. Y.; Wang, I. H., “On the Minimax Misclassification Ratio of Hypergraph Community Detection, IEEE Transactions on Information Theory, 65, 8095-8118 (2019) · Zbl 1433.94038 · doi:10.1109/TIT.2019.2928301
[5] Chung, F. R. K.; Graham, F. C., Spectral Graph Theory (No. 92 (1997), Providence, RI: American Mathematical Society, Providence, RI · Zbl 0867.05046
[6] Cui, P.; Wang, X.; Pei, J.; Zhu, W., “A Survey on Network Embedding, IEEE Transactions on Knowledge and Data Engineering, 31, 833-852 (2018) · doi:10.1109/TKDE.2018.2849727
[7] Dhillon, I. S., Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning, the Seventh ACM SIGKDD International Conference (2001) · doi:10.1145/502512.502550
[8] Fienberg, S. E., “A Brief History of Statistical Models for Network Analysis and Open Challenges, Journal of Computational and Graphical Statistics, 21, 825-839 (2012) · doi:10.1080/10618600.2012.738106
[9] Girvan, M.; Newman, M. E. J., “Community Structure in Social and Biological Networks, Proceedings of the National Academy of Sciences of the United States of America, 99, 7821-7826 (2002) · Zbl 1032.91716 · doi:10.1073/pnas.122653799
[10] Gorski, J.; Pfeuffer, F.; Klamroth, K., “Biconvex Sets and Optimization With Biconvex Functions: A Survey and Extensions, Mathematical Methods of Operations Research, 66, 373-407 (2007) · Zbl 1146.90495 · doi:10.1007/s00186-007-0161-1
[11] Guimerà, R.; Sales-Pardo, M.; Amaral, L. A. N., “Module Identification in Bipartite and Directed Networks, Physical Review E, 76, 036102 (2007) · doi:10.1103/PhysRevE.76.036102
[12] Guo, X., Qiu, Y., Zhang, H., and Chang, X. (2020), “Randomized Spectral Co-Clustering for Large-Scale Directed Networks,” arXiv no. 2004.12164v2.
[13] Hoff, P. D.; Raftery, A. E.; Handcock, M. S., “Latent Space Approaches to Social Network Analysis, Journal of the American Statistical Association, 97, 1090-1098 (2002) · Zbl 1041.62098 · doi:10.1198/016214502388618906
[14] Holland, P. W.; Laskey, K. B.; Leinhardt, S., “Stochastic Blockmodels: First Steps, Social Networks, 5, 109-137 (1983) · doi:10.1016/0378-8733(83)90021-7
[15] Ji, P.; Jin, J., “Coauthorship and Citation Networks for Statisticians, The Annals of Applied Statistics, 10, 1779-1812 (2016) · Zbl 1454.62541 · doi:10.1214/15-AOAS896
[16] Joe, H.; Ward, J., “Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association, 58, 236-244 (1963) · doi:10.1080/01621459.1963.10500845
[17] Jure, L.; Andrej, K., “SNAP Datasets: Stanford Large Network Dataset Collection,” (2014)
[18] Kaufman, L.; Rousseeuw, P. J., Finding Groups in Data: An Introduction to Cluster Analysis, 344 (2009), Hoboken, NJ: Wiley, Hoboken, NJ
[19] Ke, Z. T., Shi, F., and Xia, D. (2020), “Community Detection for Hypergraph Networks via Regularized Tensor Power Iteration,” arXiv no. 1909.06503.
[20] Klein, T.; Rio, E., “Concentration Around the Mean for Maxima of Empirical Processes, The Annals of Probability, 33, 1060-1077 (2005) · Zbl 1066.60023 · doi:10.1214/009117905000000044
[21] Kluger, Y.; Basri, R.; Chang, J. T.; Gerstein, M., “Spectral Biclustering of Microarray Data: Coclustering Genes and Conditions, Genome Research, 13, 703-716 (2003) · doi:10.1101/gr.648603
[22] Koltchinskii, V., Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems: Ecole d’Eté de Probabilités de Saint-Flour XXXVIII-2008, 2033 (2011), Berlin, Heidelberg: Springer-Verlag, Berlin, Heidelberg · Zbl 1223.91002
[23] Kossinets, G.; Watts, D. J., “Empirical Analysis of an Evolving Social Network, Science, 311, 88-90 (2006) · Zbl 1226.91055 · doi:10.1126/science.1116869
[24] Lei, J.; Rinaldo, A., “Consistency of Spectral Clustering in Stochastic Block Models, The Annals of Statistics, 43, 215-237 (2015) · Zbl 1308.62041 · doi:10.1214/14-AOS1274
[25] Leicht, E. A.; Newman, M. E. J., “Community Structure in Directed Networks, Physical Review Letters, 100, 118703 (2008) · doi:10.1103/PhysRevLett.100.118703
[26] Leskovec, J.; Lang, K. J.; Dasgupta, A.; Mahoney, M. W., Statistical Properties of Community Structure in Large Social and Information Networks (2008) · doi:10.1145/1367497.1367591
[27] Li, T.; Levina, E.; Zhu, J., “Network Cross-Validation by Edge Sampling, Biometrika, 107, 257-276 (2020) · Zbl 1441.62049 · doi:10.1093/biomet/asaa006
[28] Lin, Y., “A Note on Margin-Based Loss Functions in Classification, Statistics & Probability Letters, 68, 73-82 (2004) · Zbl 1058.62052
[29] Ma, Z.; Ma, Z.; Yuan, H., “Universal Latent Space Model Fitting for Large Networks With Edge Covariates,”, Journal of Machine Learning Research, 21, 1-67 (2020) · Zbl 1497.68432
[30] Mariadassou, M.; Robin, S.; Vacher, C., “Uncovering Latent Structure in Valued Graphs: A Variational Approach, The Annals of Applied Statistics, 4, 715-742 (2010) · Zbl 1194.62125 · doi:10.1214/10-AOAS361
[31] Morrison, D.; Jacobson, S.; Sauppe, J.; Sewell, E., “Branch-and-Bound Algorithms: A Survey of Recent Advances in Searching, Branching, and Pruning, Discrete Optimization, 19, 79-102 (2016) · Zbl 1387.90010 · doi:10.1016/j.disopt.2016.01.005
[32] Nepusz, T.; Yu, H.; Paccanaro, A., “Detecting Overlapping Protein Complexes in Protein-Protein Interaction Networks, Nature Methods, 9, 471 (2012) · doi:10.1038/nmeth.1938
[33] Newman, M. E. J., “Modularity and Community Structure in Networks, Proceedings of the National Academy of Sciences of the United States of America, 103, 8577-8582 (2006) · doi:10.1073/pnas.0601602103
[34] Perozzi, B.; Al-Rfou, R.; Skiena, S., Deepwalk (2014)
[35] Rohe, K.; Qin, T.; Yu, B., “Co-Clustering Directed Graphs to Discover Asymmetries and Directional Communities, Proceedings of the National Academy of Sciences of the United States of America, 113, 12679-12684 (2016) · Zbl 1406.91306 · doi:10.1073/pnas.1525793113
[36] Satuluri, V.; Parthasarathy, S., Symmetrizations for Clustering Directed Graphs (2011)
[37] Sengupta, S.; Chen, Y., “A Block Model for Node Popularity in Networks With Community Structure, Journal of the Royal Statistical Society, Series B, 80, 365-386 (2018) · Zbl 1458.62166 · doi:10.1111/rssb.12245
[38] Stanton-Salazar, R. D.; Dornbusch, S. M., “Social Capital and the Reproduction of Inequality: Information Networks Among Mexican-Origin High School Students, Sociology of Education, 68, 116-135 (1995) · doi:10.2307/2112778
[39] Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; Mei, Q., 1067-1077 (2015)
[40] Tang, X.; Xue, F.; Qu, A., “Individualized Multidirectional Variable Selection,”, Journal of the American Statistical Association, 0, 1-17 (2020)
[41] Von Luxburg, U., “A Tutorial on Spectral Clustering, Statistics and Computing, 17, 395-416 (2007) · doi:10.1007/s11222-007-9033-z
[42] Wang, D.; Cui, P.; Zhu, W., Structural Deep Network Embedding (2016)
[43] Wang, J., “Consistent Selection of the Number of Clusters via Crossvalidation, Biometrika, 97, 893-904 (2010) · Zbl 1204.62104 · doi:10.1093/biomet/asq061
[44] Wang, X., Cui, P., Wang, J., Pei, J., Zhu, W., and Yang, S. (2017), “Community Preserving Network Embedding,” in Thirty-first AAAI Conference on Artificial Intelligence.
[45] Witten, D. M.; Shojaie, A.; Zhang, F., “The Cluster Elastic Net for High-Dimensional Regression With Unknown Variable Grouping, Technometrics, 56, 112-122 (2014) · doi:10.1080/00401706.2013.810174
[46] Zhang, Y.; Levina, E.; Zhu, J., “Estimating Network Edge Probabilities by Neighbourhood Smoothing, Biometrika, 104, 771-783 (2017) · Zbl 07072327 · doi:10.1093/biomet/asx042
[47] Zhao, Y.; Levina, E.; Zhu, J., “Consistency of Community Detection in Networks Under Degree-Corrected Stochastic Block Models, The Annals of Statistics, 40, 2266-2292 (2012) · Zbl 1257.62095 · doi:10.1214/12-AOS1036
[48] Zhu, Y.; Shen, X.; Ye, C., “Personalized Prediction and Sparsity Pursuit in Latent Factor Models, Journal of the American Statistical Association, 111, 241-252 (2016) · doi:10.1080/01621459.2014.999158
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.