Abstract
Locality sensitive hashing (LSH) has been extensively employed to solve the problem of c-approximate nearest neighbor search (c-ANNS) in high-dimensional spaces. However, the search performance of LSH is degenerated with the number of data increasing. To this end, we propose an efficient method called Data Aware Sensitive Hashing (DASH) to deal with this drawback. DASH is the data-dependent hashing algorithm under considering the residual distance prior. DASH leverages this prior knowledge and provides theoretical guarantee for search results. Our experimental results with various datasets show that DASH achieves better search performance and the running time can reach up to about 4–40x speedups compared with other state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of ACM STOC, pp. 604–613 (1998)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of SoCG, pp. 253–262 (2004)
Ren, Z., Gu, Yu., Li, C., Li, F.F., Yu, G.: GPU-based dynamic hyperspace hash with full concurrency. Data Sci. Eng. 6(3), 265–279 (2021). https://doi.org/10.1007/s41019-021-00161-5
Gan, J., Feng, J., Fang, Q., Ng, W.: Locality-sensitive hashing scheme based on dynamic collision counting. In: Proceedings of SIGMOD, pp. 541–552 (2012)
Huang, Q., Feng, J., Zhang, Y., et al.: Query-aware locality-sensitive hashing for approximate nearest neighbor search. In: Proceedings of VLDB, pp. 1–12 (2015)
Lu, K., Wang, H., Wang, W., Kudo, M.: VHP: approximate nearest neighbor search via virtual hypersphere partitioning. In: Proceedings of VLDB, pp. 1443–1455 (2020)
Andoni, A., Razenshteyn, I.: Optimal data-dependent hashing for approximate near neighbors. In: Proceedings of STOC, pp. 793–801 (2015)
Andoni, A., Naor, A., Nikolov, A., et al.: Data-dependent hashing via nonlinear spectral gaps. In: Proceedings of ACM SOTC, pp. 787–800 (2018)
Gao, J., Jagadish, H.V., et al.: DSH: data sensitive hashing for high-dimensional k-nnsearch. In: Proceedings of SIGMOD, pp. 1127–1138 (2014)
Jegou, H., Douze, M., Schmid, C.: Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 117–128 (2010)
Dong, W., Wang, Z., Josephson, W., et al.: Modeling lsh for performance tuning. In: Proceedings of CIMK, pp. 669–678 (2008)
Guttman, A.: R-trees: A dynamic index structure for spatial searching. In: Proceedings of SIGMOD, pp. 47–57 (1984)
Bustos, B. Pedreira, O. Brisaboa, N.: A dynamic pivot selection technique for similarity search. In: Proceedings of SISAP, pp. 394–401 (2008)
Ge, T., He, K., Ke, Q., Sun, J.: Optimized product quantization for approximate nearest neighbor search. In: Proceedings of CVPR, pp. 2946–2953 (2013)
Babenko, A., Lempitsky, V.: Tree quantization for large-scale similarity search and classification. In: Proceedings of CVPR, pp. 4240–4248 (2015)
Yi, P., Li, J., Choi, B., Bhowmick, S.S., Xu, J.: FLAG: towards graph query autocompletion for large graphs. Data Sci. Eng. 7(2), 175–191 (2022)
Malkov, Y.A., Yashunin, D.A.: Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs. IEEE Trans. Pattern Anal. Mach. Intell. (2018)
Zheng, B., Xi, Z., Weng, L. et al.: PM-LSH: A fast and accurate LSH framework for high-dimensional approximate NN search. In: Proceedings of VLDB, pp. 643–655 (2020)
Lu, K. and Kudo, M.: R2LSH: A nearest neighbor search scheme based on two-dimensional projected spaces. In: Proceedings of ICDE, pp. 1045–1056 (2020)
Sun, Y., Wang, W., Qin, J., et al.: SRS: solving c-approximate nearest neighbor queries in high dimensional euclidean space with a tiny index. In: Proceedings of VLDB, pp. 1–12 (2014)
Arora, A., Sinha, S., Kumar, P., Bhattacharya, A.: Hd-index: Pushing the scalability-accuracy boundary for approximate knn search in highdimensional spaces. In: Proceedings of VLDB, pp. 906–919 (2018)
Liu, Y, Cheng, H, Cui, J.: PQBF: I/O-efficient approximate nearest neighbor search by product quantization. In: CIKM, pp. 667–676 (2017)
Satuluri, V., Parthasarathy, S.: Bayesian locality sensitive hashing for fast similarity search. In: Proceedings of VLDB, pp. 430–441 (2012)
Acknowledgements
The work reported in this paper is partially supported by NSF of Shanghai under grant number 22ZR1402000, the Fundamental Research Funds for the Central Universities under grant number 2232021A-08, State Key Laboratory of Computer Architecture (ICT,CAS) under Grant No. CARCHB 202118, Information Development Project of Shanghai Economic and Information Commission (202002009).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tan, Z., Wang, H., Du, M., Zhang, J. (2023). DASH: Data Aware Locality Sensitive Hashing. In: Li, B., Yue, L., Tao, C., Han, X., Calvanese, D., Amagasa, T. (eds) Web and Big Data. APWeb-WAIM 2022. Lecture Notes in Computer Science, vol 13422. Springer, Cham. https://doi.org/10.1007/978-3-031-25198-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-25198-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25197-9
Online ISBN: 978-3-031-25198-6
eBook Packages: Computer ScienceComputer Science (R0)