×

Lightweight distributed suffix array construction. (English) Zbl 1430.68080

Kobourov, Stephen (ed.) et al., Proceedings of the 21st workshop on algorithm engineering and experiments, ALENEX ’19, San Diego, CA, USA, January 7–8, 2019. Philadelphia, PA: Society for Industrial and Applied Mathematics (SIAM). 27-38 (2019).
Summary: We present two new distributed suffix array construction algorithms. One of our algorithms requires only half as much memory as its competitor (PSAC) [P. Flick and S. Aluru, “Parallel distributed memory construction of suffix and longest common prefix arrays”, in: Proceedings of the international conference for high performance computing, networking, storage and analysis, SC 2015. New York, NY: Association for Computing Machinery (ACM). Paper No. 16, 10 p. (2015; doi:10.1145/2807591.2807609)], while achieving similar speed. In practice, we can compute on the same hardware suffix arrays for text twice as large as PSAC. The other algorithm still requires less memory than PSAC but is faster on some instances. As a by-product, we also engineered the first distributed string sorting algorithm. All of our algorithms are tested on text collections of up to 115 GB and running on 1280 cores.
For the entire collection see [Zbl 1409.68020].

MSC:

68P15 Database theory
68P10 Searching and sorting
68W15 Distributed algorithms
Full Text: DOI