skip to main content
research-article

The Power of Simple Tabulation Hashing

Published: 01 June 2012 Publication History

Abstract

Randomized algorithms are often enjoyed for their simplicity, but the hash functions used to yield the desired theoretical guarantees are often neither simple nor practical. Here we show that the simplest possible tabulation hashing provides unexpectedly strong guarantees.
The scheme itself dates back to Zobrist in 1970 who used it for game playing programs. Keys are viewed as consisting of c characters. We initialize c tables H1, ..., Hc mapping characters to random hash codes. A key x = (x1, ..., xc) is hashed to H1[x1] ⊕ ⋯ ⊕ Hc[xc], where ⊕ denotes bit-wise exclusive-or.
While this scheme is not even 4-independent, we show that it provides many of the guarantees that are normally obtained via higher independence, for example, Chernoff-type concentration, min-wise hashing for estimating set intersection, and cuckoo hashing.

References

[1]
Alon, N., Matias, Y., and Szegedy, M. 1999. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58, 1, 209--223.
[2]
Azar, Y., Broder, A. Z., Karlin, A. R., and Upfal, E. 1999. Balanced allocations. SIAM J. Comput. 29, 1, 180--200.
[3]
Braverman, V., Chung, K.-M., Liu, Z., Mitzenmacher, M., and Ostrovsky, R. 2010. AMS without 4-wise independence on product domains. In Proceedings of the 27th Symposium on Theoretical Aspects of Computer Science (STACS). 119--130.
[4]
Broder, A. Z., Charikar, M., Frieze, A. M., and Mitzenmacher, M. 2000. Min-wise independent permutations. J. Comput. Syst. Sci. 60, 3, 630--659.
[5]
Carter, L. and Wegman, M. N. 1979. Universal classes of hash functions. J. Comput. Syst. Sci. 18, 2, 143--154.
[6]
Cohen, J. S. and Kane, D. M. 2009. Bounds on the independence required for cuckoo hashing. Manuscript. http://math.stanford.edu/~dankane/cuchkoohashing.pdf.
[7]
Dietzfelbinger, M. 1996. Universal hashing and k-wise independent random variables via integer arithmetic without primes. In Proceedings of the 13th Symposium on Theoretical Aspects of Computer Science (STACS). 569--580.
[8]
Dietzfelbinger, M., and Rink, M. 2009. Applications of a splitting trick. In Proceedings of the 36th International Colloquium on Automata, Languages and Programming (ICALP). 354--365.
[9]
Dietzfelbinger, M. and Schellbach, U. 2009. On risks of using cuckoo hashing with simple universal hash classes. In Proceedings of the 20th ACM/SIAM Symposium on Discrete Algorithms (SODA). 795--804.
[10]
Dietzfelbinger, M. and Woelfel, P. 2003. Almost random graphs with simple hash functions. In Proceedings of the 25th ACM Symposium on Theory of Computing (STOC). 629--638.
[11]
Dietzfelbinger, M., Hagerup, T., Katajainen, J., and Penttonen, M. 1997. A reliable randomized algorithm for the closest-pair problem. J. Algor. 25, 1, 19--51.
[12]
Gerber, L. 1968. An extension of Bernoulli’s inequality. Amer. Math. Month. 75, 875--876.
[13]
Indyk, P. 2001. A small approximately min-wise independent family of hash functions. J. Algor. 38, 1, 84--90.
[14]
Karloff, H. J. and Raghavan, P. 1993. Randomized algorithms and pseudorandom numbers. J. ACM 40, 3, 454--476.
[15]
Knuth, D. E. 1963. Notes on open addressing. Unpublished memorandum. http://citeseer.ist.psu.edu/knuth63notes.html.
[16]
Mitzenmacher, M., and Vadhan, S. P. 2008. Why simple hash functions work: exploiting the entropy in a data stream. In Proceedings of the 19th ACM/SIAM Symposium on Discrete Algorithms (SODA). 746--755.
[17]
Motwani, R. and Raghavan, P. 1995. Randomized Algorithms. Cambridge University Press.
[18]
Pagh, A., Pagh, R., and Ružić, M. 2009. Linear probing with constant independence. SIAM J. Comput. 39, 3, 1107--1120.
[19]
Pagh, R., and Rodler, F. F. 2004. Cuckoo hashing. J. Algor. 51, 2, 122--144.
[20]
Pǎtraşcu, M., and Thorup, M. 2010. On the k-independence required by linear probing and minwise independence. In Proceedings of the 37th International Colloquium on Automata, Languages and Programming (ICALP). 715--726.
[21]
Schmidt, J. P., Siegel, A., and Srinivasan, A. 1995. Chernoff-Hoeffding bounds for applications with limited independence. SIAM J. Discr. Math. 8, 2, 223--250.
[22]
Siegel, A. 2004. On universal classes of extremely random constant-time hash functions. SIAM J. Comput. 33, 3, 505--543.
[23]
Thorup, M. 2000. Even strongly universal hashing is pretty fast. In Proceedings of the 11th ACM/SIAM Symposium on Discrete Algorithms (SODA). 496--497.
[24]
Thorup, M. 2009. String hashing for linear probing. In Proceedings of the 20th ACM/SIAM Symposium on Discrete Algorithms (SODA). 655--664.
[25]
Thorup, M., and Zhang, Y. 2012. Tabulation-based 5-independent hashing with applications to linear probing and second moment estimation. SIAM J. Comput. 41, 2, 293--331.
[26]
Wegman, M. N., and Carter, L. 1981. New classes and applications of hash functions. J. Comput. Syst. Sci. 22, 3, 265--279.
[27]
Zobrist, A. L. 1970. A new hashing method with application for game playing. Tech. rep. 88, Computer Sciences Department, University of Wisconsin, Madison, WI.

Cited By

View all
  • (2024)Accelerating Aggregation Using a Real Processing-in-Memory System2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00300(3920-3932)Online publication date: 13-May-2024
  • (2024)Metagenomic functional profiling: to sketch or not to sketch?Bioinformatics10.1093/bioinformatics/btae39740:Supplement_2(ii165-ii173)Online publication date: 4-Sep-2024
  • (2024)A Micro-architecture that supports the Fano–Elias encoding and a hardware accelerator for approximate membership queriesMicroprocessors and Microsystems10.1016/j.micpro.2023.104992105(104992)Online publication date: Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Journal of the ACM
Journal of the ACM  Volume 59, Issue 3
June 2012
180 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/2220357
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2012
Accepted: 01 March 2012
Revised: 01 November 2011
Received: 01 June 2011
Published in JACM Volume 59, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Hashing
  2. tabulation

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)80
  • Downloads (Last 6 weeks)10
Reflects downloads up to 23 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Accelerating Aggregation Using a Real Processing-in-Memory System2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00300(3920-3932)Online publication date: 13-May-2024
  • (2024)Metagenomic functional profiling: to sketch or not to sketch?Bioinformatics10.1093/bioinformatics/btae39740:Supplement_2(ii165-ii173)Online publication date: 4-Sep-2024
  • (2024)A Micro-architecture that supports the Fano–Elias encoding and a hardware accelerator for approximate membership queriesMicroprocessors and Microsystems10.1016/j.micpro.2023.104992105(104992)Online publication date: Mar-2024
  • (2024)Improving Compressed Matrix Multiplication using Control Variate MethodInformation Processing Letters10.1016/j.ipl.2024.106517(106517)Online publication date: Jun-2024
  • (2024)Unlocking the Lookup Singularity with LassoAdvances in Cryptology – EUROCRYPT 202410.1007/978-3-031-58751-1_7(180-209)Online publication date: 26-May-2024
  • (2023)LONE SAMPLERProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v37i7.26014(8413-8420)Online publication date: 7-Feb-2023
  • (2023)Iceberg Hashing: Optimizing Many Hash-Table Criteria at OnceJournal of the ACM10.1145/362581770:6(1-51)Online publication date: 30-Nov-2023
  • (2023)Brief Announcement: A Parallel Architecture for Dynamic Approximate MembershipProceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3558481.3591312(291-294)Online publication date: 17-Jun-2023
  • (2023)Locally Uniform Hashing2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS)10.1109/FOCS57990.2023.00089(1440-1470)Online publication date: 6-Nov-2023
  • (2023)Cuckoo Hashing in Cryptography: Optimal Parameters, Robustness and ApplicationsAdvances in Cryptology – CRYPTO 202310.1007/978-3-031-38551-3_7(197-230)Online publication date: 20-Aug-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media