×

Balanced hashing, color coding and approximate counting. (English) Zbl 1273.68270

Chen, Jianer (ed.) et al., Parameterized and exact computation. 4th international workshop, IWPEC 2009, Copenhagen, Denmark, September 10–11, 2009. Revised selected papers. Berlin: Springer (ISBN 978-3-642-11268-3/pbk). Lecture Notes in Computer Science 5917, 1-16 (2009).
Summary: Color coding is an algorithmic technique for deciding efficiently if a given input graph contains a path of a given length (or another small subgraph of constant tree-width). Applications of the method in computational biology motivate the study of similar algorithms for counting the number of copies of a given subgraph. While it is unlikely that exact counting of this type can be performed efficiently, as the problem is \(\#W[1]\)-complete even for paths, approximate counting is possible, and leads to the investigation of an intriguing variant of families of perfect hash functions. A family of functions from \([n]\) to \([k]\) is an \((\varepsilon ,k)\)-balanced family of hash functions, if there exists a positive \(T\) so that for every \(K \subset [n]\) of size \(|K| = k\), the number of functions in the family that are one-to-one on \(K\) is between \((1 - \varepsilon )T\) and \((1 + \varepsilon )T\). The family is perfectly \(k\)-balanced if it is \((0,k)\)-balanced.
We show that every such perfectly \(k\)-balanced family is of size at least \(c(k) n^{\lfloor k/2 \rfloor}\), and that for every \(\varepsilon>\frac{1}{\text{poly}(k)}\) there are explicit constructions of \((\varepsilon ,k)\)-balanced families of hash functions from \([n]\) to \([k]\) of size \(e ^{(1 + o(1))k } \log n\). This is tight up to the \(o(1)\)-term in the exponent, and supplies deterministic polynomial time algorithms for approximately counting the number of paths or cycles of a specified length \(k\) (or copies of any graph \(H\) with \(k\) vertices and bounded tree-width) in a given input graph of size \(n\), up to relative error \(\varepsilon \), for all \(k \leq O(\log n)\).
For the entire collection see [Zbl 1178.68005].

MSC:

68R05 Combinatorics in computer science
68Q25 Analysis of algorithms and problem complexity
68R10 Graph theory (including graph drawing) in computer science
92D10 Genetics and epigenetics
Full Text: DOI

References:

[1] Alon, N.; Babai, L.; Itai, A., A fast and simple randomized parallel algorithm for the maximal independent set problem, Journal of Algorithms, 7, 4, 567-583 (1986) · Zbl 0631.68063 · doi:10.1016/0196-6774(86)90019-2
[2] Alon, N., Dao, P., Hajirasouliha, I., Hormozdiari, F., Sahinalp, S.C.: Biomolecular network motif counting and discovery by color coding. In: ISMB (Supplement of Bioinformatics), pp. 241-249 (2008)
[3] Alon, N.; Goldreich, O.; Håstad, J.; Peralta, R., Simple construction of almost k-wise independent random variables, Random Struct. Algorithms, 3, 3, 289-304 (1992) · Zbl 0755.60002 · doi:10.1002/rsa.3240030308
[4] Alon, N.; Gutner, S.; Arge, L.; Cachin, C.; Jurdziński, T.; Tarlecki, A., Balanced families of perfect hash functions and their applications, Automata, Languages and Programming, 435-446 (2007), Heidelberg: Springer, Heidelberg · Zbl 1171.68868 · doi:10.1007/978-3-540-73420-8_39
[5] Alon, N.; Roichman, Y., Random Cayley graphs and expanders, Random Struct. Algorithms, 5, 2, 271-285 (1994) · Zbl 0798.05048 · doi:10.1002/rsa.3240050203
[6] Alon, N.; Schwartz, O.; Shapira, A., An elementary construction of constant-degree expanders, Combin. Probab. Comput., 17, 3, 319-327 (2008) · Zbl 1194.05021 · doi:10.1017/S0963548307008851
[7] Alon, N.; Spencer, J. H., The probabilistic method (2008), Hoboken: John Wiley & Sons Inc., Hoboken · Zbl 1148.05001
[8] Alon, N.; Yuster, R.; Zwick, U., Color-coding, Journal of the ACM, 42, 4, 844-856 (1995) · Zbl 0885.68116 · doi:10.1145/210332.210337
[9] Alon, N.; Yuster, R.; Zwick, U., Finding and counting given length cycles, Algorithmica, 17, 3, 209-223 (1997) · Zbl 0865.68093 · doi:10.1007/BF02523189
[10] Arvind, V.; Raman, V.; Bose, P.; Morin, P., Approximation algorithms for some parameterized counting problems, Algorithms and Computation, 453-464 (2002), Heidelberg: Springer, Heidelberg · Zbl 1019.68135 · doi:10.1007/3-540-36136-7_40
[11] Björklund, A.; Husfeldt, T.; Kaski, P.; Koivisto, M.; Fiat, A.; Sanders, P., Counting paths and packings in halves, ESA 2009, 578-586 (2009), Heidelberg: Springer, Heidelberg · Zbl 1256.05230 · doi:10.1007/978-3-642-04128-0_52
[12] Feller, W., An introduction to probability theory and its applications (1968), New York: Wiley, New York · Zbl 0155.23101
[13] Flum, J.; Grohe, M., The parameterized complexity of counting problems, SIAM Journal on Computing, 33, 4, 892-922 (2004) · Zbl 1105.68042 · doi:10.1137/S0097539703427203
[14] Fredman, M. L.; Komlós, J.; Szemerédi, E., Storing a sparse table with O(1) worst case access time, Journal of the ACM, 31, 3, 538-544 (1984) · Zbl 0629.68068 · doi:10.1145/828.1884
[15] Hall, M. Jr., Combinatorial theory (1986), New York: John Wiley & Sons Inc.. A Wiley-Interscience Publication, New York · Zbl 0588.05001
[16] Hoory, S.; Linial, N.; Wigderson, A., Expander graphs and their applications, Bull. Amer. Math. Soc (N.S.), 43, 4, 439-561 (2006) · Zbl 1147.68608 · doi:10.1090/S0273-0979-06-01126-8
[17] Hüffner, F., Wernicke, S., Zichner, T.: Algorithm engineering for color-coding to facilitate signaling pathway detection. In: Sankoff, D., Wang, L., Chin, F. (eds.) APBC. Advances in Bioinformatics and Computational Biology, vol. 5, pp. 277-286. Imperial College Press (2007) · Zbl 1170.68048
[18] Impagliazzo, R.; Paturi, R., On the complexity of k-SAT, J. Comput. Syst. Sci, 62, 2, 367-375 (2001) · Zbl 0990.68079 · doi:10.1006/jcss.2000.1727
[19] Impagliazzo, R.; Paturi, R.; Zane, F., Which problems have strongly exponential complexity?, J. Comput. Syst. Sci., 63, 4, 512-530 (2001) · Zbl 1006.68052 · doi:10.1006/jcss.2001.1774
[20] Jukna, S., Extremal combinatorics (2001), Berlin: Springer, Berlin · Zbl 0978.05001
[21] Naor, M., Schulman, L.J., Srinivasan, A.: Splitters and near-optimal derandomization. In: 36th Annual Symposium on Foundations of Computer Science, pp. 182-191 (1995) · Zbl 0938.68932
[22] Schmidt, J. P.; Siegel, A., The spatial complexity of oblivious k-probe hash functions, SIAM Journal on Computing, 19, 5, 775-786 (1990) · Zbl 0711.68039 · doi:10.1137/0219054
[23] Scott, J.; Ideker, T.; Karp, R. M.; Sharan, R., Efficient algorithms for detecting signaling pathways in protein interaction networks, Journal of Computational Biology, 13, 2, 133-144 (2006) · doi:10.1089/cmb.2006.13.133
[24] Sharan, R.; Ideker, T., Modeling cellular machinery through biological network comparison, Nature Biotechnology, 24, 4, 427-433 (2006) · doi:10.1038/nbt1196
[25] Shlomi, T.; Segal, D.; Ruppin, E.; Sharan, R., QPath: a method for querying pathways in a protein-protein interaction network, BMC Bioinformatics, 7, 199 (2006) · doi:10.1186/1471-2105-7-199
[26] Vassilevska, V.; Williams, R.; Mitzenmacher, M., Finding, minimizing, and counting weighted subgraphs, Proceedings of the 41st Annual ACM Symposium on Theory of Computing, 455-464 (2009), New York: ACM, New York · Zbl 1304.05102 · doi:10.1145/1536414.1536477
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.