×

Explainable graph clustering via expanders in the massively parallel computation model. (English) Zbl 07874033

Summary: Explainable clustering provides human-understandable reasons for decisions in black-box learning models. In a previous work, a decision tree built on the set of dimensions was used to define ranges of values for \(k\)-means clusters. For explainable graph clustering, we use expander graphs instead of dense subgraphs since powering an expander graph is guaranteed to result in a clique after at most a logarithmic number of steps.
Consider a set of multi-dimensional points labeled with \(k\) labels. We introduce the heat map sorting problem as reordering the rows and columns of an input matrix (each point is a column and each row is a dimension) such that the labels of the entries of the matrix form connected components (clusters). A cluster is preserved if it remains connected, i.e., if it is not split into several clusters and no two clusters are merged. In the massively parallel computation model (MPC), each machine has a sublinear memory and the total memory of the machines is linear. We prove the problem is NP-hard. We give a fixed-parameter algorithm in MPC and an approximation algorithm based on expander decomposition. We empirically compare our algorithm with explainable \(k\)-means on several graphs of email and computer networks.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
62H30 Classification and discrimination; cluster analysis (statistical aspects)

Software:

SNAP
Full Text: DOI

References:

[1] Marcinkevičs, R.; Vogt, J. E., Interpretability and explainability: a machine learning zoo mini-tour, 2020, arXiv preprint
[2] Holzinger, A.; Saranti, A.; Molnar, C.; Biecek, P.; Samek, W., Explainable ai methods-a brief overview, (International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers, 2022, Springer), 13-38
[3] Dasgupta, S.; Frost, N.; Moshkovitz, M.; Rashtchian, C., Explainable k-means and k-medians clustering, (Proceedings of the 37th International Conference on Machine Learning. Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, 2020), 12-18
[4] Kriegel, H.-P.; Kröger, P.; Zimek, A., Detecting clusters in moderate-to-high dimensional data: subspace clustering, pattern-based clustering, and correlation clustering, Proc. VLDB Endow., 1, 2, 1528-1529, 2008
[5] Pfeifer, B.; Saranti, A.; Holzinger, A., Gnn-subnet: disease subnetwork detection with explainable graph neural networks, Bioinformatics, 38, Supplement_2, ii120-ii126, 2022
[6] Lo, W. W.; Kulatilleke, G.; Sarhan, M.; Layeghy, S.; Portmann, M., XG-BoT: an explainable deep graph neural network for botnet detection and forensics, Int. Things, 22, Article 100747 pp., 2023
[7] Uhříček, D.; Hynek, K.; Čejka, T.; Kolář, D., BOTA: explainable IoT malware detection in large networks, IEEE Int. Things J., 10, 10, 8416-8431, 2023
[8] Laber, E.; Murtinho, L.; Oliveira, F., Shallow decision trees for explainable k-means clustering, Pattern Recognit., 137, Article 109239 pp., 2023
[9] Beame, P.; Koutris, P.; Suciu, D., Communication steps for parallel query processing, (32nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2013, 2013)
[10] Beame, P.; Koutris, P.; Suciu, D., Communication steps for parallel query processing, J. ACM, 64, 6, 40, 2017 · Zbl 1426.68074
[11] Karloff, H.; Suri, S.; Vassilvitskii, S., A model of computation for mapreduce, (Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms, 2010, Society for Industrial and Applied Mathematics), 938-948 · Zbl 1288.68247
[12] Fish, B.; Kun, J.; Lelkes, A. D.; Reyzin, L.; Turán, G., On the computational complexity of mapreduce, (Distributed Computing: 29th International Symposium, Proceedings 29. Distributed Computing: 29th International Symposium, Proceedings 29, DISC 2015, Tokyo, Japan, October 7-9, 2015, 2015, Springer), 1-15 · Zbl 1394.68176
[13] Frei, F.; Wada, K., Efficient circuit simulation in mapreduce, (30th International Symposium on Algorithms and Computation, 2019, Schloss Dagstuhl-Leibniz-Zentrum Fuer Informatik)
[14] Yaroslavtsev, G.; Vadapalli, A., Massively parallel algorithms and hardness for single-linkage clustering under \(\ell_p\)-distances, (Proceedings of the 35th International Conference on Machine Learning, 2018)
[15] Andoni, A.; Song, Z.; Stein, C.; Wang, Z.; Zhong, P., Parallel graph connectivity in log diameter rounds, (2018 IEEE 59th Annual Symposium on Foundations of Computer Science, 2018, IEEE), 674-685
[16] Cygan, M.; Fomin, F. V.; Kowalik, Ł.; Lokshtanov, D.; Marx, D.; Pilipczuk, M.; Pilipczuk, M.; Saurabh, S., Parameterized Algorithms, vol. 4, 2015, Springer: Springer Switzerland · Zbl 1334.90001
[17] Niedermeier, R., Invitation to Fixed-Parameter Algorithms, vol. 31, 2006, OUP: OUP Oxford · Zbl 1095.68038
[18] Chang, Y.-J.; Saranurak, T., Deterministic distributed expander decomposition and routing with applications in distributed derandomization, (2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), 2020, IEEE), 377-388
[19] Zenklusen, R., A 1.5-approximation for path tsp, (Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, 2019, SIAM), 1539-1549 · Zbl 1431.68157
[20] Frost, N., ExKMC, 2023
[21] Leskovec, J.; Krevl, A., SNAP datasets: Stanford large network dataset collection, Jun. 2014
[22] Leskovec, J.; Kleinberg, J.; Faloutsos, C., Graph evolution: densification and shrinking diameters, ACM Trans. Knowl. Discov. Data, 1, 1, 2007, 2-es
[23] Yin, H.; Benson, A. R.; Leskovec, J.; Gleich, D. F., Local higher-order graph clustering, (Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2017), 555-564
[24] Madani, O.; Averineni, S. A.; Gandham, S., A dataset of networks of computing hosts, (Proceedings of the 2022 ACM on International Workshop on Security and Privacy Analytics, 2022), 100-104
[25] Frost, N.; Moshkovitz, M.; Rashtchian, C., Exkmc: expanding explainable k-means clustering, 2020, arXiv preprint
[26] Shapira, S. D.; Gat-Viks, I.; Shum, B. O.; Dricot, A.; de Grace, M. M.; Wu, L.; Gupta, P. B.; Hao, T.; Silver, S. J.; Root, D. E., A physical and regulatory map of host-influenza interactions reveals pathways in H1N1 infection, Cell, 139, 7, 1255-1267, 2009
[27] Zhou, J.; Wang, M.; Sun, T.; Zhou, X.; Wang, J.; Wang, Y.; Zhang, R.; Luo, R.; Yu, H., Uncovering anti-influenza mechanism of ophiocordyceps sinensis using network pharmacology, molecular pharmacology, and metabolomics, Medicine, 102, 35, Article e34843 pp., 2023
[28] Hashem, A. M.; Gravel, C.; Chen, Z.; Yi, Y.; Tocchi, M.; Jaentschke, B.; Fan, X.; Li, C.; Rosu-Myles, M.; Pereboev, A., CD40 ligand preferentially modulates immune response and enhances protection against influenza virus, J. Immunol., 193, 2, 722-734, 2014
[29] Subhashini, D.; Anand, D. A., Network biology of KEGG enriched viral comorbidities in psoriasis subjects, (2021 Innovations in Power and Advanced Computing Technologies (i-PACT), 2021, IEEE), 1-6
[30] Ye, Y.; Gaugler, B.; Mohty, M.; Malard, F., Plasmacytoid dendritic cell biology and its role in immune-mediated diseases, Clin. Transl. Immunol., 9, 5, Article e1139 pp., 2020
[31] Chen, Y.; Zhu, S.; Liao, T.; Wang, C.; Han, J.; Yang, Z.; Lu, X.; Hu, Z.; Hu, J.; Wang, X., The HN protein of newcastle disease virus induces cell apoptosis through the induction of lysosomal membrane permeabilization, PLoS Pathog., 20, 2, Article e1011981 pp., 2024
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.