skip to main content
column

Graph stream algorithms: a survey

Published: 13 May 2014 Publication History

Abstract

Over the last decade, there has been considerable interest in designing algorithms for processing massive graphs in the data stream model. The original motivation was two-fold: a) in many applications, the dynamic graphs that arise are too large to be stored in the main memory of a single machine and b) considering graph problems yields new insights into the complexity of stream computation. However, the techniques developed in this area are now finding applications in other areas including data structures for dynamic graphs, approximation algorithms, and distributed and parallel computation. We survey the state-of-the-art results; identify general techniques; and highlight some simple algorithms that illustrate basic ideas.

References

[1]
K. J. Ahn. Analyzing massive graphs in the semi-streaming model. PhD thesis, University of Pennsylvania, Philadelphia, Pennsylvania, Jan. 2013.
[2]
K. J. Ahn and S. Guha. Graph sparsification in the semi-streaming model. In International Colloquium on Automata, Languages and Programming, pages 328--338, 2009.
[3]
K. J. Ahn and S. Guha. Access to data and number of iterations: Dual primal algorithms for maximum matching under resource constraints. CoRR, abs/1307.4359, 2013.
[4]
K. J. Ahn and S. Guha. Linear programming in the semi-streaming model with application to the maximum matching problem. Inf. Comput., 222:59--79, 2013.
[5]
K. J. Ahn, S. Guha, and A. McGregor. Analyzing graph structure via linear measurements. In ACM-SIAM Symposium on Discrete Algorithms, pages 459--467, 2012.
[6]
K. J. Ahn, S. Guha, and A. McGregor. Graph sketches: sparsification, spanners, and subgraphs. In ACM Symposium on Principles of Database Systems, pages 5--14, 2012.
[7]
K. J. Ahn, S. Guha, and A. McGregor. Spectral sparsification of dynamic graph streams. In International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, 2013.
[8]
M. Badoiu, A. Sidiropoulos, and V. Vaikuntanathan. Computing s-t min-cuts in a semi-streaming model. Manuscript.
[9]
B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph in streaming and mapreduce. PVLDB, 5(5):454--465, 2012.
[10]
Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in streaming algorithms, with an application to counting triangles in graphs. In ACM-SIAM Symposium on Discrete Algorithms, pages 623--632, 2002.
[11]
S. Baswana. Streaming algorithm for graph spanners - single pass and constant processing time per edge. Inf. Process. Lett., 106(3):110--114, 2008.
[12]
J. D. Batson, D. A. Spielman, and N. Srivastava. Twice-ramanujan sparsifiers. SIAM J. Comput., 41(6):1704--1721, 2012.
[13]
L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient algorithms for large-scale local triangle counting. TKDD, 4(3), 2010.
[14]
A. A. Benczúr and D. R. Karger. Approximating s-t minimum cuts in ¿O(n2) time. In ACM Symposium on Theory of Computing, pages 47--55, 1996.
[15]
B. Bollobás. Extremal Graph Theory. Academic Press, New York, 1978.
[16]
V. Braverman and R. Ostrovsky. Smooth histograms for sliding windows. In IEEE Symposium on Foundations of Computer Science, pages 283--293, 2007.
[17]
V. Braverman, R. Ostrovsky, and D. Vilenchik. How hard is counting triangles in the streaming model? In International Colloquium on Automata, Languages and Programming, pages 244--254, 2013.
[18]
L. S. Buriol, G. Frahling, S. Leonardi, A. Marchetti-Spaccamela, and C. Sohler. Counting triangles in data streams. In ACM Symposium on Principles of Database Systems, pages 253--262, 2006.
[19]
A. Chakrabarti, G. Cormode, and A. McGregor. Robust lower bounds for communication and stream computation. In ACM Symposium on Theory of Computing, pages 641--650, 2008.
[20]
A. Chakrabarti and S. Kale. Submodular maximization meets streaming: Matchings, matroids, and more. CoRR, arXiv:1309.2038, 2013.
[21]
G. Cormode and S. Muthukrishnan. Space efficient mining of multigraph streams. In ACM Symposium on Principles of Database Systems, pages 271--282, 2005.
[22]
M. S. Crouch, A. McGregor, and D. Stubbs. Dynamic graphs in the sliding-window model. In European Symposium on Algorithms, pages 337--348, 2013.
[23]
M. Elkin. Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners. ACM Transactions on Algorithms, 7(2):20, 2011.
[24]
M. Elkin and J. Zhang. Efficient algorithms for constructing (1 + e, ß)-spanners in the distributed and streaming models. Distributed Computing, 18(5):375--385, 2006.
[25]
L. Epstein, A. Levin, J. Mestre, and D. Segev. Improved approximation guarantees for weighted matching in the semi-streaming model. SIAM J. Discrete Math., 25(3):1251--1265, 2011.
[26]
L. Epstein, A. Levin, D. Segev, and O. Weimann. Improved bounds for online preemptive matching. In STACS, pages 389--399, 2013.
[27]
J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. On graph problems in a semi-streaming model. Theoretical Computer Science, 348(2-3):207--216, 2005.
[28]
J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. Graph distances in the data-stream model. SIAM Journal on Computing, 38(5):1709--1727, 2008.
[29]
W. S. Fung, R. Hariharan, N. J. A. Harvey, and D. Panigrahi. A general framework for graph sparsification. In ACM Symposium on Theory of Computing, pages 71--80, 2011.
[30]
A. Goel, M. Kapralov, and S. Khanna. On the communication and streaming complexity of maximum bipartite matching. In ACM-SIAM Symposium on Discrete Algorithms, pages 468--485, 2012.
[31]
A. Goel, M. Kapralov, and I. Post. Single pass sparsification in the streaming model with edge deletions. CoRR, abs/1203.4900, 2012.
[32]
O. Goldreich. Introduction to testing graph properties. In O. Goldreich, editor, Studies in Complexity and Cryptography, volume 6650 of Lecture Notes in Computer Science, pages 470--506. Springer, 2011.
[33]
V. Guruswami and K. Onak. Superlinear lower bounds for multipass graph processing. In IEEE Conference on Computational Complexity, pages 287--298, 2013.
[34]
B. V. Halldórsson, M. M. Halldórsson, E. Losievskaja, and M. Szegedy. Streaming algorithms for independent sets. In International Colloquium on Automata, Languages and Programming, pages 641--652, 2010.
[35]
M. M. Halldórsson, X. Sun, M. Szegedy, and C. Wang. Streaming and communication complexity of clique approximation. In International Colloquium on Automata, Languages and Programming, pages 449--460, 2012.
[36]
M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing on data streams. External memory algorithms, pages 107--118, 1999.
[37]
M. Jha, C. Seshadhri, and A. Pinar. A space efficient streaming algorithm for triangle counting using the birthday paradox. In KDD, pages 589--597, 2013.
[38]
M. Jha, C. Seshadhri, and A. Pinar. When a graph is not so simple: Counting triangles in multigraph streams. CoRR, arXiv:1310.7665, 2013.
[39]
H. Jowhari and M. Ghodsi. New streaming algorithms for counting triangles in graphs. In COCOON, pages 710--716, 2005.
[40]
H. Jowhari, M. Saglam, and G. Tardos. Tight bounds for lp samplers, finding duplicates in streams, and related problems. In ACM Symposium on Principles of Database Systems, pages 49--58, 2011.
[41]
D. M. Kane, K. Mehlhorn, T. Sauerwald, and H. Sun. Counting arbitrary subgraphs in data streams. In International Colloquium on Automata, Languages and Programming, pages 598--609, 2012.
[42]
M. Kapralov. Better bounds for matchings in the streaming model. In ACM-SIAM Symposium on Discrete Algorithms, pages 1679--1697, 2013.
[43]
M. Kapralov, S. Khanna, and M. Sudan. Approximating matching size from random streams. In ACM-SIAM Symposium on Discrete Algorithms, 2014.
[44]
B. M. Kapron, V. King, and B. Mountjoy. Dynamic graph connectivity in polylogarithmic worst case time. In ACM-SIAM Symposium on Discrete Algorithms, pages 1131--1142, 2013.
[45]
D. R. Karger. Random sampling in cut, flow, and network design problems. In ACM Symposium on Theory of Computing, pages 648--657, 1994.
[46]
J. A. Kelner and A. Levin. Spectral sparsification in the semi-streaming setting. Theory Comput. Syst., 53(2):243--262, 2013.
[47]
C. Konrad, F. Magniez, and C. Mathieu. Maximum matching in semi-streaming with few passes. In APPROX-RANDOM, pages 231--242, 2012.
[48]
C. Konrad and A. Rosén. Approximating semi-matchings in streaming and in two-party communication. In International Colloquium on Automata, Languages and Programming, pages 637--649, 2013.
[49]
K. Kutzkov and R. Pagh. On the streaming complexity of computing local clustering coefficients. In WSDM, pages 677--686, 2013.
[50]
M. Manjunath, K. Mehlhorn, K. Panagiotou, and H. Sun. Approximate counting of cycles in streams. In European Symposium on Algorithms, pages 677--688, 2011.
[51]
A. McGregor. Finding graph matchings in data streams. In APPROX-RANDOM, pages 170--181, 2005.
[52]
S. Muthukrishnan. Data Streams: Algorithms and Applications. Now Publishers, 2006.
[53]
R. Pagh and C. E. Tsourakakis. Colorful triangle counting and a mapreduce implementation. Inf. Process. Lett., 112(7):277--281, 2012.
[54]
A. Pavan, K. Tangwongsan, S. Tirthapura, and K.-L. Wu. Counting and sampling triangles from a graph stream. In International Conference on Very Large Data Bases, 2013.
[55]
J. M. Phillips, E. Verbin, and Q. Zhang. Lower bounds for number-in-hand multiparty communication complexity, made easy. In ACM-SIAM Symposium on Discrete Algorithms, pages 486--501, 2012.
[56]
A. D. Sarma, S. Gollapudi, and R. Panigrahy. Estimating pagerank on graph streams. J. ACM, 58(3):13, 2011.
[57]
A. D. Sarma, R. J. Lipton, and D. Nanongkai. Best-order streaming model. Theor. Comput. Sci., 412(23):2544--2555, 2011.
[58]
D. A. Spielman and N. Srivastava. Graph sparsification by effective resistances. SIAM J. Comput., 40(6):1913--1926, 2011.
[59]
D. A. Spielman and S.-H. Teng. Spectral sparsification of graphs. SIAM J. Comput., 40(4):981--1025, 2011.
[60]
R. Tarjan. Data Structures and Network Algorithms. SIAM, Philadelphia, 1983.
[61]
A. B. Varadaraja. Buyback problem - approximate matroid intersection with cancellation costs. In International Colloquium on Automata, Languages and Programming, pages 379--390, 2011.
[62]
M. Zelke. Weighted matching in the semi-streaming model. Algorithmica, 62(1-2):1--20, 2012.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGMOD Record
ACM SIGMOD Record  Volume 43, Issue 1
March 2014
71 pages
ISSN:0163-5808
DOI:10.1145/2627692
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2014
Published in SIGMOD Volume 43, Issue 1

Check for updates

Qualifiers

  • Column

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)202
  • Downloads (Last 6 weeks)31
Reflects downloads up to 21 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)D3-GNN: Dynamic Distributed Dataflow for Streaming Graph Neural NetworksProceedings of the VLDB Endowment10.14778/3681954.368196117:11(2764-2777)Online publication date: 1-Jul-2024
  • (2024)Incremental Sliding Window Connectivity over Streaming GraphsProceedings of the VLDB Endowment10.14778/3675034.367504017:10(2473-2486)Online publication date: 1-Jun-2024
  • (2024)Querying Structural Diversity in Streaming GraphsProceedings of the VLDB Endowment10.14778/3641204.364121317:5(1034-1046)Online publication date: 1-Jan-2024
  • (2024)Streaming Graph Algorithms in the Massively Parallel Computation ModelProceedings of the 43rd ACM Symposium on Principles of Distributed Computing10.1145/3662158.3662770(496-507)Online publication date: 17-Jun-2024
  • (2024)On Querying Historical Connectivity in Temporal GraphsProceedings of the ACM on Management of Data10.1145/36549602:3(1-25)Online publication date: 30-May-2024
  • (2024)Tight Lower Bounds for Directed Cut Sparsification and Distributed Min-CutProceedings of the ACM on Management of Data10.1145/36511482:2(1-18)Online publication date: 14-May-2024
  • (2024)Mining Temporal NetworksCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3641245(1260-1263)Online publication date: 13-May-2024
  • (2024)Online Spanners in Metric SpacesSIAM Journal on Discrete Mathematics10.1137/22M153457238:1(1030-1056)Online publication date: 12-Mar-2024
  • (2024)Decentralized Low-Stretch Trees via Low Diameter Graph DecompositionsSIAM Journal on Computing10.1137/22M148903453:2(247-286)Online publication date: 13-Mar-2024
  • (2024)Streaming Local Community Detection Through Approximate ConductanceIEEE Transactions on Big Data10.1109/TBDATA.2023.331025110:1(12-22)Online publication date: Feb-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media