Abstract
Given n vectors \(x_0, x_1, \ldots , x_{n-1}\) in \(\{0,1\}^{m}\), how to find two vectors whose pairwise Hamming distance is minimum? This problem is known as the Closest Pair Problem. If these vectors are generated uniformly at random except two of them are correlated with Pearson-correlation coefficient \(\rho \), then the problem is called the Light Bulb Problem. In this work, we propose a novel coding-based scheme for the Close Pair Problem. We design both randomized and deterministic algorithms, which achieve the best known running time when the minimum distance is very small compared to the length of input vectors. When applied to the Light Bulb Problem, our algorithms yields state-of-the-art deterministic running time when the Pearson-correlation coefficient \(\rho \) is very large.
A full version of the paper is available at https://arxiv.org/abs/1802.09104.
N. Xie, S. Xu and Y. Xu—Research supported in part by NSF grant 1423034.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Subqubic fast matrix multiplication algorithms are practical only for Strassen-based ones [11, 22]. Even though the recent breakthrough results [20, 35, 42] achieve asymptotically faster than Strassen’s algorithm [36], however, these algorithms are all based on Coppersmith-Winograd’s algorithm [15], and to the best of our knowledge, there are no practical implementations of these trilinear based algorithms.
- 2.
Actually, we only need to “check” when the two adjacent decoded vectors are identical.
- 3.
The Hamming bound, also known as the sphere packing bound, specifies an upper bound on the number codewords a code can have given the block length and the minimum distance of the code.
- 4.
The GV bound is known to be attainable by the random codes.
- 5.
All these results are due to the fact that Valiant’s algorithms are much more robust to weak correlations than other algorithms. Our algorithms therefore do not give improved bounds for these learning problems in the general settings.
References
Abboud, A., Rubinstein, A., Williams, R.: Distributed PCP theorems for hardness of approximation in P. In: Proceedings of 58th Annual IEEE Symposium on Foundations of Computer Science, pp. 25–36 (2017)
Abboud, A., Williams, R., Yu, H.: More applications of the polynomial method to algorithm design. In: Proceedings of 26th ACM-SIAM Symposium on Discrete Algorithms, pp. 218–230 (2015)
Alman, J., Chan, T.M., Williams, R.: Polynomial representations of threshold functions and algorithmic applications. In: Proceedings of 57th Annual IEEE Symposium on Foundations of Computer Science, pp. 467–476 (2016)
Alman, J., Williams, R.: Probabilistic polynomials and Hamming nearest neighbors. In: Proceedings of 56th Annual IEEE Symposium on Foundations of Computer Science, pp. 136–150 (2015)
Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. Commun. ACM 51, 117–122 (2008)
Andoni, A., Indyk, P.: Nearest neighbors in high-dimensional spaces. In: Goodman, J., O’Rourke, J., Toth, C.D. (eds.) Handbook of Discrete and Computational Geometry, 3rd (edn). Chapman and Hall/CRC (2017)
Andoni, A., Laarhoven, T., Razenshteyn, I., Waingarten, E.: Optimal hashing-based time-space trade-offs for approximate near neighbors. In: Proceedings of 28th ACM-SIAM Symposium on Discrete Algorithms, pp. 47–66 (2017)
Arya, S., Mount, D.: Computational geometry: proximity and location. In: Mehta, D.P., Sahni, S. (eds) Handbook of Data Structures and Applications. Chapman and Hall/CRC (2005)
Aston, C., Ralph, D., Lalo, D., Manjeshwar, S., Gramling, B., DeFreese, D., West, A., Branam, D., Thompson, L., Craft, M., et al.: Oligogenic combinations associated with breast cancer risk in women under 53 years of age. Hum. Genet. 116(3), 208–221 (2005)
Becker, A., Joux, A., May, A., Meurer, A.: Decoding random binary linear codes in \(2^{n/20}\): how \(1+1=0\) improves information set decoding. In: EUROCRYPT, pp. 520–536 (2012)
Benson, A., Ballard, G.: A framework for practical parallel fast matrix multiplication. In: ACM SIGPLAN Notices, pp. 42–53 (2015)
Bentley, J.: Multidimensional divide-and-conquer. Commun. ACM 23(4), 214–229 (1980)
Cho, J., Nicolae, D., Gold, L., Fields, C., LaBuda, M., Rohal, P., Pickles, M., Qin, L., Fu, Y., Mann, J., et al.: Identification of novel susceptibility loci for inflammatory bowel disease on chromosomes 1p, 3q, and 4q: evidence for epistasis between 1p and IBD1. Proc. Natl. Acad. Sci. 95(13), 7502–7507 (1998)
Clarkson, K.: A randomized algorithm for closest-point queries. SIAM J. Comput. 17(4), 830–847 (1988)
Coppersmith, D., Winograd, S.: Matrix multiplication via arithmetic progressions. In: Proceedings of 19th Annual ACM Symposium on the Theory of Computing, pp. 1–6 (1987)
Cordell, H.: Detecting gene \(\times \) gene interactions that underlie human diseases. Nat. Rev. Genet. 10(6), 392–404 (2009)
Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.: Locality-sensitive hashing scheme based on \(p\)-stable distributions. In: Proceedings of 20th Symposium on Computational Geometry, pp. 253–262 (2004)
Dubiner, M.: Bucketing coding and information theory for the statistical high-dimensional nearest-neighbor problem. IEEE Trans. Inf. Theory 56(8), 4166–4179 (2008)
Frazer, K., Ballinger, D., Cox, D., Hinds, D., Stuve, L., Gibbs, R., Belmont, J., Boudreau, A., Hardenbol, P., Leal, S., et al.: A second generation human haplotype map of over 3.1 million SNPs. Nature 449(7164), 851–861 (2007)
Le Gall, F.: Faster algorithms for rectangular matrix multiplication. In: Proceedings of 53rd Annual IEEE Symposium on Foundations of Computer Science, pp. 514–523 (2012)
Golin, M.J., Raman, R., Schwarz, C., Smid, M.: Simple randomized algorithms for closest pair problems. Nord. J. Comput. 2(1), 3–27 (1995)
Huang, J., Rice, L., Matthews, D., van de Geijn, R.: Generating families of practical fast matrix multiplication algorithms. In: 2017 IEEE International on Parallel and Distributed Processing Symposium (IPDPS), pp. 656–667 (2017)
Indyk, P., Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of 30th Annual ACM Symposium on the Theory of Computing, pp. 604–613 (1998)
Karppa, M., Kaski, P., Kohonen, J.: A faster subquadratic algorithm for finding outlier correlations. In: Proceedings of 27th ACM-SIAM Symposium on Discrete Algorithms, pp. 1288–1305 (2016)
Khuller, S., Matias, Y.: A simple randomized sieve algorithm for the closest-pair problem. Inf. Comput. 118(1), 34–37 (1995)
Kushilevitz, E., Ostrovsky, R., Rabani, Y.: Efficient search for approximate nearest neighbor in high dimensional spaces. In: Proceedings of 30th Annual ACM Symposium on the Theory of Computing, pp. 614–623 (1998)
May, A., Ozerov, I.: On computing nearest neighbors with applications to decoding of binary linear codes. In: EUROCRYPT, pp. 203–228 (2015)
Meiser, S.: Point location in arrangements of hyperplanes. Inf. Comput. 106(2), 286–303 (1993)
Motwani, R., Naor, A., Panigrahi, R.: Lower bounds on locality sensitive hashing. In: Proceedings of 22nd Symposium on Computational Geometry, pp. 154–157 (2006)
Musani, S., Shriner, D., Liu, N., Feng, R., Coffey, C., Yi, N., Tiwari, H., Allison, D.: Detection of gene \(\times \) gene interactions in genome-wide association studies of human population data. Hum. Hered. 63(2), 67–84 (2007)
O’Donnell, R., Wu, Y., Zhou, Y.: Optimal lower bounds for locality-sensitive hashing (except when \(q\) is tiny). ACM Trans. Comput. Theory 6(1), 5 (2014)
Paturi, R., Rajasekaran, S., Reif, J.: The light bulb problem. Inf. Comput. 117, 187–192 (1995)
Rabin, M.: Probabilistic algorithms. In: Algorithms and Complexity, pp. 21–30 (1976)
Smid, M.: Closest-point problems in computational geometry. In: Sack, J., Urrutia, J. (eds.) Handbook of computational geometry, pp. 877–935. Elsevier Science Publishing (1997)
Stothers, A.: On the complexity of matrix multiplication. Ph.D. thesis, The University of Edinburgh (2010)
Strassen, V.: Gaussian elimination is not optimal. Numer. Math. 13(4), 354–356 (1969)
Valiant, G.: Finding correlations in subquadratic time, with applications to learning parities and juntas. J. ACM 62, 13 (2015). Earlier version in FOCS 2012
Valiant, L.G.: Functionality in neural nets. In: AAAI, pp. 629–634 (1988)
Williams, R.: Faster all-pairs shortest paths via circuit complexity. In: Proceedings of 46th Annual ACM Symposium on the Theory of Computing, pp. 664–673 (2014)
Williams, R.: The polynomial method in circuit complexity applied to algorithm design (invited talk). In: Conference on Foundation of Software Technology and Theoretical Computer Science (FSTTCS), pp. 47–60 (2014)
Williams, R.: On the difference between closest, furthest, and orthogonal pairs: nearly-linear vs barely-subquadratic complexity. In: Proceedings of 29th ACM-SIAM Symposium on Discrete Algorithms, pp. 1207–1215 (2018)
Williams, V.: Multiplying matrices faster than Coppersmith-Winograd. In: Proceedings of 44th Annual ACM Symposium on the Theory of Computing, pp. 887–898 (2012)
Acknowledgements
We would like to thank Karthik C.S. and the anonymous referees for their valuable comments.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Xie, N., Xu, S., Xu, Y. (2018). A New Algorithm for Finding Closest Pair of Vectors (Extended Abstract). In: Fomin, F., Podolskii, V. (eds) Computer Science – Theory and Applications. CSR 2018. Lecture Notes in Computer Science(), vol 10846. Springer, Cham. https://doi.org/10.1007/978-3-319-90530-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-90530-3_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-90529-7
Online ISBN: 978-3-319-90530-3
eBook Packages: Computer ScienceComputer Science (R0)