×

The Johnson-Lindenstrauss transform: an empirical study. (English) Zbl 1430.68388

Müller-Hannemann, Matthias (ed.) et al., Proceedings of the 13th workshop on algorithm engineering and experiments (ALENEX ’11), San Francisco, CA, USA, January 22, 2011. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). 164-173 (2011).
Summary: The Johnson-Lindenstrauss Lemma states that a set of \(n\) points may be embedded in a space of dimension \(O(\log n/\varepsilon^2)\) while preserving all pairwise distances within a factor of \((1 + \epsilon)\) with high probability. It has inspired a number of proofs that extend the result, simplify it, and improve the efficiency of computing the resulting embedding. The lemma is a critical tool in the realm of dimensionality reduction and high dimensional approximate computational geometry. It is also employed for data mining in domains that analyze intrinsically high dimensional objects such as images and text. However, while algorithms performing the dimensionality reduction have become increasingly sophisticated, there is little understanding of the behavior of these embeddings in practice. In this paper, we present the first comprehensive study of the empirical behavior of algorithms for dimensionality reduction based on the JL Lemma.
Our study answers a number of important questions about the quality of the embeddings and the performance of algorithms used to compute them. Among our key results:
(i)
Determining a likely range for the big-Oh constant in practice for the dimension of the target space, and demonstrating the accuracy of the predicted bounds.
(ii)
Finding ‘best in class’ algorithms over wide ranges of data size and source dimensionality, and showing that these depend heavily on parameters of the data as well its sparsity.
(iii)
Developing the best implementation for each method, making use of non-standard optimized code for key subroutines.
(iv)
Identifying critical computational bottlenecks that can spur further theoretical study of efficient algorithms.

For the entire collection see [Zbl 1280.68032].

MSC:

68U05 Computer graphics; computational geometry (digital and algorithmic aspects)
68T09 Computational aspects of data analysis and big data
68W40 Analysis of algorithms