×

Parallel maximum clique algorithms with applications to network analysis. (English) Zbl 1323.05103

Summary: We present a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from a thousand to a hundred million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. At its heart the algorithm employs a branch-and-bound strategy with novel and aggressive pruning techniques. The pruning techniques include the combined use of core numbers of vertices along with a good initial heuristic solution to remove the vast majority of the search space. In addition, the exploration of the search tree is parallelized. During the search, processes immediately communicate changes to upper and lower bounds on the size of the maximum clique. This exchange of information occasionally results in a superlinear speedup because tasks with large search spaces can be pruned by other processes. We demonstrate the impact of the algorithm on applications using two different network analysis problems: computation of temporal strong components in dynamic networks and determination of compression-friendly ordering of nodes of massive networks.

MSC:

05C69 Vertex subsets with special properties (dominating sets, independent sets, cliques, etc.)
05C82 Small world graphs, complex networks (graph-theoretic aspects)
05C85 Graph algorithms (graph-theoretic aspects)
05C90 Applications of graph theory
90C27 Combinatorial optimization
Full Text: DOI

References:

[1] N. K. Ahmed, F. Berchmans, J. Neville, and R. Kompella, {\it Time-based sampling of social network activity graphs}, in Proceedings of the Eighth Workshop on Mining and Learning with Graphs (MLG’10), ACM, Washington, DC, 2010, pp. 1-9.
[2] V. Batagelj and M. Zaversnik, {\it An \(o (m)\) algorithm for core decomposition of networks}, arXiv:0310049, 2003. · Zbl 1284.05252
[3] S. Bhadra and A. Ferreira, {\it Complexity of connected components in evolving graphs and the computation of multicast trees in dynamic networks}, in Proceedings of ADHOC-NOW 2003, Lecture Notes in Comput. Sci. 2865, Springer, New York, 2003, pp. 259-270.
[4] P. Boldi, B. Codenotti, M. Santini, and S. Vigna, {\it UbiCrawler: A scalable fully distributed web crawler}, Softw. Pract. Exper., 34 (2004), pp. 711-726.
[5] P. Boldi, M. Rosa, M. Santini, and S. Vigna, {\it Layered label propagation: A multiresolution coordinate-free ordering for compressing social networks}, in Proceedings of the 20th International Conference on the World Wide Web, ACM, New York, 2011, pp. 587-596.
[6] P. Boldi and S. Vigna, {\it The webgraph framework I: Compression techniques}, in Proceedings of the 13th International Conference on the World Wide Web, ACM, New York, 2004, pp. 595-602.
[7] P. Boldi and S. Vigna, {\it Codes for the world wide web}, Internet Math., 2 (2005), pp. 407-429. · Zbl 1101.94013
[8] I. M. Bomze, M. Budinich, P. M. Pardalos, and M. Pelillo, {\it The maximum clique problem}, in Handbook of Combinatorial Optimization 4, D.-Z. Du and P. Pardalos, eds., Springer, New York, 1999, pp. 1-74. · Zbl 1253.90188
[9] C. Bron and J. Kerbosch, {\it Algorithm 457: Finding all cliques of an undirected graph}, Comm. ACM, 16 (1973), pp. 575-577. · Zbl 0261.68018
[10] G. Buehrer and K. Chellapilla, {\it A scalable pattern mining approach to web graph compression with communities}, in Proceedings of the International Conference on Web Search and Web Data Mining (WSDM2008), ACM, New York, 2008, pp. 95-106.
[11] CAIDA, {\it Skitter}, software available from \burlhttp://caida.org/tools/measurement/skitter/.
[12] M. Charikar, {\it Greedy approximation algorithms for finding dense components in a graph}, in Proceedings of the Third International Workshop on Approximation Algorithms for Combinatorial Optimization (APPROX ’00), Springer-Verlag, New York, Berlin, 2000, pp. 84-95. · Zbl 0976.05062
[13] J. Cheng, Y. Ke, A. W-C. Fu, J. X. Yu, and L. Zhu, {\it Finding maximal cliques in massive networks}, ACM Trans. Database Syst., 36 (2011), 21.
[14] J. Cheng, L. Zhu, Y. Ke, and S. Chu, {\it Fast algorithms for maximal clique enumeration with limited memory}, in Proceedings of the ACM SIGKDD, 2012, pp. 1240-1248.
[15] E. Cho, S. A. Myers, and J. Leskovec, {\it Friendship and mobility: User movement in location-based social networks}, in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’11), ACM, New York, 2011, pp. 1082-1090.
[16] M. D. Conover, J. Ratkiewicz, M. Francisco, B. Gonçalves, A. Flammini, and F. Menczer, {\it Political polarization on Twitter}, in Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media (ICWSM), 2011, pp. 89-96.
[17] P. G. Constantine and D. F. Gleich, {\it Using polynomial chaos to compute the influence of multiple random surfers in the PageRank model}, in Proceedings of the 5th Workshop on Algorithms and Models for the Web Graph (WAW2007), A. Bonato and F. C. Graham, eds., Lecture Notes in Comput. Sci. 4863, Springer, New York, 2007, pp. 82-95. · Zbl 1136.68321
[18] G. Csardi and T. Nepusz, {\it The igraph software package for complex network research}, InterJournal, Complex Systems, (2006), 1695; online at http://necsi.org/events/iccs6/papers/c1602a3c126ba822d0bc4293371c.pdf.
[19] E. D. Dolan and J. J. Moré, {\it Benchmarking optimization software with performance profiles}, Math.l Program., 91 (2002), pp. 201-213. · Zbl 1049.90004
[20] N. Du, C. Faloutsos, B. Wang, and L. Akoglu, {\it Large human communication networks: Patterns and a utility-driven generator}, in Proceedings of the ACM SIGKDD, 2009, pp. 269-278.
[21] N. Du, B. Wu, L. Xu, B. Wang, and P. Xin, {\it Parallel algorithm for enumerating maximal cliques in complex network}, in Mining Complex Data, Vol. 165, Springer, New York, 2009, pp. 207-221. · Zbl 1171.68855
[22] N. Eagle and A. Pentland, {\it Reality mining: Sensing complex social systems}, Personal and Ubiquitous Computing, 10 (2006), pp. 255-268.
[23] D. Eppstein, M. Löffler, and D. Strash, {\it Listing all maximal cliques in sparse graphs in near-optimal time}, Algorithms Comput., (2010), pp. 403-414. · Zbl 1311.05187
[24] D. Eppstein and D. Strash, {\it Listing all maximal cliques in large sparse real-world graphs}, Experiment. Alg., (2011), pp. 364-375. · Zbl 1365.05276
[25] P. Erdös and A. Hajnal, {\it On chromatic number of graphs and set-systems}, Acta Math. Acad. Sci. Hungarica, 17 (1966), pp. 61-99. · Zbl 0151.33701
[26] A. Ferreira, {\it On models and algorithms for dynamic communication networks: The case for evolving graphs}, in Proceedings of ALGOTEL, 2002, pp. 155-161.
[27] H. J. Finck and H. Sachs, {\it Über eine von H.S. Wilf angegebene Schranke für die chromatische Zahl endlicher Graphen}, Math. Nachr., 39 (1969), pp. 373-386. · Zbl 0172.25704
[28] M. Gjoka, M. Kurant, C. T. Butts, and A. Markopoulou, {\it Walking in Facebook: A case study of unbiased sampling of OSNs}, in Proceedings of INFOCOM, 2010, pp. 1-9.
[29] D. F. Gleich, {\it Graph of Flickr photo-sharing social network}, in DOI: 10.4231/D39P2W550, 2012.
[30] L. Isella, J. Stehlé, A. Barrat, C. Cattuto, J. F. Pinton, and W. Van den Broeck, {\it What’s in a crowd? Analysis of face-to-face behavioral networks}, J. Theoret. Biol., 271 (2011), pp. 166-180. · Zbl 1405.92255
[31] H. Jeong, B. Tombor, R. Albert, Z. N. Oltvai, and A. L. Barabasi, {\it The large-scale organization of metabolic networks}, Nature, 407 (2000), pp. 651-654.
[32] U. Kang and C. Faloutsos, {\it Beyond “caveman communities”: Hubs and spokes for graph compression and mining}, in Proceedings of ICDM 2011, IEEE Press, Piscataway, NJ, 2011, pp. 300-309.
[33] C. Karande, K. Chellapilla, and R. Andersen, {\it Speeding up algorithms on compressed web graphs}, in Proceedings of the Second ACM International Conference on Web Search and Data Mining (WSDM ’09), ACM, New York, 2009, pp. 272-281. · Zbl 1235.68037
[34] S. Khot, {\it Improved inapproximability results for MaxClique, chromatic number and approximate graph coloring}, in Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, 2001, pp. 600-609.
[35] J. Konc and D. Janezic, {\it An improved branch and bound algorithm for the maximum clique problem}, Proteins, 4 (2007), p. 5. · Zbl 1274.05452
[36] G. Kortsarz and D. Peleg, {\it Generating sparse 2-spanners}, J. Alg., 17 (1994), pp. 222-236. · Zbl 0867.05072
[37] H. Kwak, C. Lee, H. Park, and S. Moon, {\it What is Twitter, a social network or a news media?}, in Proceedings of the 19th International Conference on the World Wide Web, ACM, New York, 2010, pp. 591-600.
[38] J. Leskovec, D. Huttenlocher, and J. Kleinberg, {\it Predicting positive and negative links in online social networks}, in Proceedings of the 19th International Conference on the World Wide Web, ACM, New York, 2010, pp. 641-650.
[39] J. Leskovec, K. J. Lang, A. Dasgupta, and M. W. Mahoney, {\it Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters}, Internet. Math., 6 (2009), pp. 29-123. · Zbl 1205.91144
[40] D. W. Matula, {\it A max-min theorem for graphs with application to graph coloring}, SIAM Rev., 10 (1968), pp. 481-482.
[41] C. McCreesh and P. Prosser, {\it Multi-threading a state-of-the-art maximum clique algorithm}, Algorithms, 6 (2013), pp. 618-635. · Zbl 1461.05215
[42] V. Nicosia, J. Tang, M. Musolesi, G. Russo, C. Mascolo, and V. Latora, {\it Components in time-varying graphs}, Chaos, 22 (2012), 023101. · Zbl 1331.68156
[43] T. Opsahl and P. Panzarasa, {\it Clustering in weighted networks}, Social Networks, 31 (2009), pp. 155-163.
[44] P. R. J. Österg\aard, {\it A fast algorithm for the maximum clique problem}, Disc. Appl. Math., 120 (2002), pp. 197-207. · Zbl 1019.05054
[45] G. Palla, I. J. Farkas, P. Pollner, I. Derényi, and T. Vicsek, {\it Fundamental statistical features and self-similar properties of tagged networks}, New J. Phys., 10 (2008), 123026.
[46] R. K. Pan and J. Saramäki, {\it Path lengths, correlations, and centrality in temporal networks}, Phys. Rev. E, 84 (2011), 016105.
[47] P. M. Pardalos and J. Xue, {\it The maximum clique problem}, J. Global Opt., 4 (1994), pp. 301-328. · Zbl 0797.90108
[48] B. Pattabiraman, Md. Mostofa Ali Patwary, A. H. Gebremedhin, W.-K. Liao, and A. Choudhary, {\it Fast algorithms for the maximum clique problem on massive sparse graphs}, in Proceedings of WAW 2013, Algorithms and Models for the Web Graph, Lecture Notes in Comput. Sci. 8305, Springer, New York, 2013, pp. 156-169. · Zbl 1342.05185
[49] B. Pattabiraman, Md. Mostofa Ali Patwary, A. H. Gebremedhin, W.-K. Liao, and A. Choudhary, {\it Fast algorithms for the maximum clique problem on massive sparse graphs with applications to overlapping community detection}, Internet. Math., 11 (2015), pp. 421-448. · Zbl 1342.05185
[50] P. Prosser, {\it Exact algorithms for maximum clique: A computational study}, arXiv:1207.4616v1, 2012. · Zbl 1461.90162
[51] R. A. Rossi, D. F. Gleich, A. H. Gebremedhin, and Md. Mostofa Ali Patwary, {\it Fast maximum clique algorithms for large graphs}, in the Companion Publication of the 23rd International Conference on the World Wide Web (WWW Companion ’14), International World Wide Web Conferences Steering Committee, Geneva, Switzerland, 2014, pp. 365-366.
[52] P. San Segundo, D. Rodríguez-Losada, and A. Jiménez, {\it An exact bit-parallel algorithm for the maximum clique problem}, Comput. Oper. Res., 38 (2011), pp. 571-581. · Zbl 1231.90369
[53] A. E. Saríyüce, B. Gedik, G. Jacques-Silva, K.-L. Wu, and Ü. V. Çatalyürek, {\it Streaming algorithms for k-core decomposition}, in Proceedings of the Very Large Databases Endowment meeting, 6 (2013), pp. 433-444.
[54] M. C. Schmidt, N. F. Samatova, K. Thomas, and B. H. Park, {\it A scalable, parallel algorithm for maximal clique enumeration}, J. Parallel Distributed Comput., 69 (2009), pp. 417-428.
[55] S. B. Seidman, {\it Network structure and minimum degree}, Social Networks, 5 (1983), pp. 269-287.
[56] R. Singh, J. Xu, and B. Berger, {\it Global alignment of multiple protein interaction networks with application to functional orthology detection}, Proc. Natl. Acad. Sci. USA, 105 (2008), pp. 12763-12768.
[57] SocioPatterns, {\it Infectious SocioPatterns Dynamic Contact Networks}, dataset available from http://www.sociopatterns.org/datasets/ (accessed 09/12/12).
[58] E. Tomita and T. Kameda, {\it An efficient branch-and-bound algorithm for finding a maximum clique with computational experiments}, J. Global Optim., 37 (2007), pp. 95-111. · Zbl 1127.90079
[59] E. Tomita, A. Tanaka, and H. Takahashi, {\it The worst-case time complexity for generating all maximal cliques and computational experiments}, Theoret. Comput. Sci., 363 (2006), pp. 28-42. · Zbl 1153.68398
[60] A. L. Traud, P. J. Mucha, and M. A. Porter, {\it Social structure of Facebook networks}, Phys. A, 391 (2012), pp. 4165-4180.
[61] M. A. Trick and D. S. Johnson, eds., {\it Cliques, Coloring, and Satisfiability: Second DIMACS Implementation Challenge}, AMS, New York, 1996. · Zbl 0875.68678
[62] P. Mahadevan, D. Krioukov, M. Fomenkov, X. Dimitropoulos, A. Vahdat, and others {\it The Internet AS-level topology: Three data sources and one definiitive metric}, SIGCOMM, 36 (2006), pp. 17-26.
[63] C. Wilson, B. Boe, A. Sala, K. P. N. Puttaswamy, and B. Y. Zhao, {\it User interactions in social networks and their implications}, in Proceedings of the Fourth EuroSys Conference, ACM, New York, 2009, pp. 205-218.
[64] J. Xiang, C. Guo, and A. Aboulnaga, {\it Scalable maximum clique computation using mapreduce}, in Proceedings of the Conference on Data Engineering (ICDE), IEEE Press, Piscataway, NJ, 2013, pp. 74-85.
[65] Y. Xie and P. S. Yu, {\it Max-clique: A top-down graph-based approach to frequent pattern mining}, in Proceedings of the 10th IEEE International Conference on Data Mining (ICDM), IEEE Press, Piscataway, NJ, 2010, pp. 1139-1144.
[66] J. Yang and J. Leskovec, {\it Defining and evaluating network communities based on ground-truth}, in Proceedings of the 12th IEEE International Conference on Data Mining (ICDM), IEEE Press, Piscataway, NJ, 2012, pp. 745-754.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.