×

Model-based clustering of multiple networks with a hierarchical algorithm. (English) Zbl 1529.62033

Summary: The paper tackles the problem of clustering multiple networks, directed or not, that do not share the same set of vertices, into groups of networks with similar topology. A statistical model-based approach based on a finite mixture of stochastic block models is proposed. A clustering is obtained by maximizing the integrated classification likelihood criterion. This is done by a hierarchical agglomerative algorithm, that starts from singleton clusters and successively merges clusters of networks. As such, a sequence of nested clusterings is computed that can be represented by a dendrogram providing valuable insights on the collection of networks. Using a Bayesian framework, model selection is performed in an automated way since the algorithm stops when the best number of clusters is attained. The algorithm is computationally efficient, when carefully implemented. The aggregation of clusters requires a means to overcome the label-switching problem of the stochastic block model and to match the block labels of the networks. To address this problem, a new tool is proposed based on a comparison of the graphons of the associated stochastic block models. The clustering approach is assessed on synthetic data. An application to a set of ecological networks illustrates the interpretability of the obtained results.

MSC:

62-08 Computational methods for problems pertaining to statistics
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

blockmodels

References:

[1] Amini, AA; Chen, A.; Bickel, PJ; Levina, E., Pseudo-likelihood methods for community detection in large sparse networks, Ann. Stat., 41, 4, 2097-2122 (2013) · Zbl 1277.62166 · doi:10.1214/13-AOS1138
[2] Bickel, PJ; Chen, A., A nonparametric view of network models and Newman-Girvan and other modularities, Proc. Natl. Acad. Sci., 106, 50, 21068-21073 (2009) · Zbl 1359.62411 · doi:10.1073/pnas.0907096106
[3] Biernacki, C.; Celeux, G.; Govaert, G., Assessing a mixture model for clustering with the integrated completed likelihood, IEEE Trans. Pattern Anal. Mach. Intell., 22, 7, 719-725 (2000) · doi:10.1109/34.865189
[4] Bollobás, B., Borgs, C., Chayes, J., Riordan, O.: Directed scale-free graphs. In: SODA ’03 Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 132-139 (2003) · Zbl 1094.68605
[5] Botella, C.; Dray, S.; Matias, C.; Miele, V.; Thuiller, W., An appraisal of graph embeddings for comparing trophic network architectures, Methods Ecol. Evol., 13, 1, 203-216 (2022) · doi:10.1111/2041-210X.13738
[6] Chabert-Liddell, S.C., Barbillon, P., Donnet, S.: Learning common structures in a collection of networks. an application to food webs (2022)
[7] Côme, E.; Latouche, P., Model selection and clustering in stochastic block models based on the exact integrated complete data likelihood, Stat. Model., 15, 6, 564-589 (2015) · Zbl 07259003 · doi:10.1177/1471082X15577017
[8] Daudin, JJ; Picard, F.; Robin, S., A mixture model for random graphs, Stat. Comput., 18, 2, 173-183 (2008) · doi:10.1007/s11222-007-9046-7
[9] Donnat, C.; Holmes, S., Tracking network dynamics: a survey using graph distances, Ann. Appl. Stat., 12, 2, 971-1012 (2018) · Zbl 1405.62171 · doi:10.1214/18-AOAS1176
[10] Fraley, C.; Raftery, AE, Model-based clustering, discriminant analysis, and density estimation, J. Am. Stat. Assoc., 97, 458, 611-631 (2002) · Zbl 1073.62545 · doi:10.1198/016214502760047131
[11] Frühwirth-Schnatter, S.; Malsiner-Walli, G., From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering, Adv. Data Anal. Classif., 13, 33-64 (2019) · Zbl 1474.62225 · doi:10.1007/s11634-018-0329-y
[12] Gärtner, T., A survey of kernels for structured data, ACM SIGKDD Explor. Newsl, 5, 1, 49-58 (2003) · doi:10.1145/959242.959248
[13] le Gorrec, L.; Knight, PA; Caen, A., Learning network embeddings using small graphlets, Soc. Netw. Anal. Min., 12, 20, 1-20 (2022)
[14] Hamilton, WL; Ying, R.; Leskovec, J., Representation learning on graphs: methods and applications, IEEE Data Eng. Bull., 40, 3, 52-74 (2017)
[15] Hubert, L.; Arabie, P., Comparing partitions, J. Classif., 2, 193-218 (1985) · doi:10.1007/BF01908075
[16] Isella, L.; Stehlé, J.; Barrat, A.; Cattuto, C.; Pinton, JF; den Broeck, WV, What’s in a crowd? Analysis of face-to-face behavioral networks, J. Theor. Biol., 271, 1, 166-180 (2011) · Zbl 1405.92255 · doi:10.1016/j.jtbi.2010.11.033
[17] Le, CM; Levin, K.; Levina, E., Estimating a network from multiple noisy realizations, Electron. J. Stat., 12, 2, 4697-4740 (2018) · Zbl 1409.62115 · doi:10.1214/18-EJS1521
[18] Leger, J.B.: Blockmodels: A R-package for estimating in latent block model and stochastic block model, with various probability functions, with or without covariates (2016)
[19] Liu, J., Monte Carlo Strategies in Scientific Computing (2008), Berlin: Springer, Berlin · Zbl 1132.65003
[20] Lovász, L.; Szegedy, B., Limits of dense graph sequences, J. Combin. Theory Ser. B, 96, 6, 933-957 (2006) · Zbl 1113.05092 · doi:10.1016/j.jctb.2006.05.002
[21] Mantziou, A., Lunagomez, S., Mitra, R.: Bayesian model-based clustering for multiple network data (2023)
[22] Matias, C.; Robin, S., Modeling heterogeneity in random graphs through latent space models: a selective review, Esaim Proc. Surv., 47, 55-74 (2014) · Zbl 1335.05002 · doi:10.1051/proc/201447004
[23] McLachlan, G., Krishnan, T.: The EM algorithm and extensions, 2nd edn. Wiley series in probability and statistics, Wiley (2008)
[24] McLachlan, G., Peel, D.: Finite Mixture Models. Wiley Series in Probability and Statistics. Wiley-Interscience (2000) · Zbl 0963.62061
[25] Mehta, N., Duke, L.C., Rai, P.: Stochastic blockmodels meet graph neural networks. In: Proceedings of the 36th International Conference on Machine Learning, Vol. 97, pp. 4466-4474 (2019)
[26] Mukherjee, S.S., Sarkar, P., Lin, L.: On clustering network-valued data. In: Advances in Neural Information Processing Systems, Vol. 30 (2017)
[27] Nowicki, K.; Snijders, TAB, Estimation and prediction for stochastic blockstructures, J. Am. Stat. Assoc., 96, 455, 1077-1087 (2001) · Zbl 1072.62542 · doi:10.1198/016214501753208735
[28] Peixoto, T., Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models, Phys. Rev. E, 89, 1 (2014) · doi:10.1103/PhysRevE.89.012804
[29] Poisot, T.; Baiser, B.; Dunne, JA; Kéfi, S.; Fc, Massol; Mouquet, N.; Romanuk, TN; Stouffer, DB; Wood, SA; Gravel, D., Mangal - making ecological network analysis simple, Ecography, 39, 4, 384-390 (2016) · doi:10.1111/ecog.00976
[30] Robert, CP, The Bayesian Choice: A Decision-theoretic Motivation (2007), New York: Springer, New York · Zbl 1129.62003
[31] Rohe, K.; Chatterjee, S.; Yu, B., Spectral clustering and the high-dimensional stochastic blockmodel, Ann. Stat., 39, 4, 1878-1915 (2011) · Zbl 1227.62042 · doi:10.1214/11-AOS887
[32] Sabanayagam, M., Vankadara, L.C., Ghoshdastidar, D.: Graphon based clustering and testing of networks: Algorithms and theory. In: The Tenth International Conference on Learning Representations (2022)
[33] Shervashidze, N., Vishwanathan, S., Petri, T., Mehlhorn, K., Borgwardt, K.: Efficient graphlet kernels for large graph comparison. In: JMLR Workshop and Conference Proceedings: AISTATS, pp 488-495 (2009)
[34] Shimada, Y.; Hirata, Y.; Ikeguchil, T.; Aihara, K., A survey of kernels for structured data, Sci. Rep., 6, 34944 (2016) · doi:10.1038/srep34944
[35] Signorelli, M.; Wit, EC, Model-based clustering for populations of networks, Stat. Model., 20, 1, 9-29 (2019) · Zbl 07290023 · doi:10.1177/1471082X19871128
[36] Stanley, N.; Shai, S.; Taylor, D.; Mucha, PJ, Clustering network layers with the strata multilayer stochastic block model, IEEE Trans. Netw. Sci. Eng., 3, 2, 95-105 (2016) · doi:10.1109/TNSE.2016.2537545
[37] Titterington, D.; Smith, A.; Makov, U., Statistical Analysis of Finite Mixture Distributions (1985), New York: Wiley, New York · Zbl 0646.62013
[38] Weber-Zendrera, A.; Sokolovska, N.; Soula, HA, Functional prediction of environmental variables using metabolic networks, Sci. Rep., 11, 12192 (2021) · doi:10.1038/s41598-021-91486-8
[39] Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, PS, A comprehensive survey on graph neural networks, IEEE Trans. Neural Netw. Learn. Syst., 32, 1, 4-24 (2021) · doi:10.1109/TNNLS.2020.2978386
[40] Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? In: International Conference on Learning Representations (2019)
[41] Young, JG; Kirkley, A.; Newman, MEJ, Clustering of heterogeneous populations of networks, Phys. Rev. E, 105, 1 (2022) · doi:10.1103/PhysRevE.105.014312
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.