×

A scalable approximation algorithm for weighted longest common subsequence. (English) Zbl 1512.68464

Sousa, Leonel (ed.) et al., Euro-Par 2021: parallel processing. 27th international conference on parallel and distributed computing, Lisbon, Portugal, September 1–3, 2021. Proceedings. Cham: Springer. Lect. Notes Comput. Sci. 12820, 368-384 (2021).
Summary: This work introduces novel parallel methods for weighted longest common subsequence (WLCS) and its generalization, all-substrings WLCS. Previous work developed efficient algorithms for these problems via Monge matrix multiplication, which is a limiting factor for further improvement. Diverging from these approaches, we relax the algorithm’s optimality guarantee in a controlled way, using a different, natural dynamic program which can be sketched and solved in a divide-and-conquer manner that is efficient to parallelize.
Additionally, to compute the base case of our algorithm, we develop a novel and efficient method for all-substrings WLCS inspired by previous work on unweighted all-substrings LCS, exploiting the typically small range of weights.
Our method fits in most parallel models of computation, including the PRAM and the BSP model. To the best of our knowledge this is the fastest \((1-\epsilon )\)-approximation algorithm for all-substrings WLCS and WLCS in BSP. Further, this is the asymptotically fastest parallel algorithm for weighted LCS as the number of processors increases.
For the entire collection see [Zbl 1483.68013].

MSC:

68W32 Algorithms on strings
68W10 Parallel algorithms in computer science
68W25 Approximation algorithms

Software:

GitHub
Full Text: DOI

References:

[1] Abboud, A., Backurs, A., Williams, V.V.: Quadratic-time hardness of LCS and other sequence similarity measures. CoRR abs/1501.07053 (2015). http://arxiv.org/abs/1501.07053
[2] Aggarwal, A.; Klawe, M.; Moran, S.; Shor, P.; Wilber, R., Geometric applications of a matrix-searching algorithm, Algorithmica, 2, 195-208 (1987) · Zbl 0642.68078 · doi:10.1007/BF01840359
[3] Alves, C., Cáceres, E., Song, S.: A coarse-grained parallel algorithm for the all-substrings longest common subsequence problem. Algorithmica 45, 301-335 (2006). doi:10.1007/s00453-006-1216-z · Zbl 1097.68144
[4] Alves, C., Cáceres, E., Song, S.: An all-substrings common subsequence algorithm. Discrete Appl. Math. 156(7), 1025-1035 (2008). doi:10.1016/j.dam.2007.05.056. http://www.sciencedirect.com/science/article/pii/S0166218X07002727 · Zbl 1138.68064
[5] Apostolico, A., Atallah, M.J., Larmore, L.L., McFaddin, S.: Efficient parallel algorithms for string editing and related problems. SIAM J. Comput. 19(5), 968-988 (1990). doi:10.1137/0219066 · Zbl 0711.68055
[6] Bringmann, K., Chaudhury, B.R.: Sketching, streaming, and fine-grained complexity of (weighted) LCS. In: 38th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science, FSTTCS 2018, Ahmedabad, India, 11-13 December 2018, pp. 40:1-40:16 (2018). doi:10.4230/LIPIcs.FSTTCS.2018.40 · Zbl 1528.68426
[7] Bringmann, K., Künnemann, M.: Multivariate fine-grained complexity of longest common subsequence. In: Proceedings of the 29th Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, 7-10 January 2018, pp. 1216-1235 (2018). doi:10.1137/1.9781611975031.79 · Zbl 1403.68369
[8] Crochemore, M.M., Landau, G., Ziv-Ukelson, M.: A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J. Comput. 32, 1654-1673 (2003). doi:10.1137/S0097539702402007 · Zbl 1253.74047
[9] Im, S., Moseley, B., Sun, X.: Efficient massively parallel methods for dynamic programming. In: Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing (STOC), Montreal, Canada, pp. 798-811 (2017) · Zbl 1370.68316
[10] Jones, N.: An Introduction to Bioinformatics Algorithms. MIT Press, Cambridge (2004)
[11] Krusche, P., Tiskin, A.: New algorithms for efficient parallel string comparison. In: Proceedings of the 22nd Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), Thira, Santorini, Greece, pp. 209-216 (2010) · Zbl 1172.68674
[12] Krusche, P., Tiskin, A.: Efficient longest common subsequence computation using bulk-synchronous parallelism. In: International Conference on Computational Science and its Applications (Proceedings, Part V), ICCSA 2006, Glasgow, UK, 8-11 May 2006, pp. 165-174 (2006). doi:10.1007/11751649_18 · Zbl 1175.68553
[13] Masek, W.J., Paterson, M.: A faster algorithm computing string edit distances. J. Comput. Syst. Sci. 20(1), 18-31 (1980). doi:10.1016/0022-0000(80)90002-1 · Zbl 0436.68044
[14] Needleman, S.; Wunsch, C., A general method applicable to the search for similarities in the amino acid sequence of two proteins, J. Mol. Biol., 48, 443-453 (1970) · doi:10.1016/0022-2836(70)90057-4
[15] Program, N.G.S.: DNA sequencing costs: Data (2017). https://www.genome.gov/sequencingcostsdata/
[16] Rajko, S.; Aluru, S., Space and time optimal parallel sequence alignments, IEEE Trans. Parallel Distrib. Syst., 15, 12, 1070-1081 (2004) · doi:10.1109/TPDS.2004.86
[17] Russo, L., Monge properties of sequence alignment, Theor. Comput. Sci., 423, 30-49 (2012) · Zbl 1253.68374 · doi:10.1016/j.tcs.2011.12.068
[18] http://jaligner.sourceforge.net/
[19] https://github.com/mengyao/Complete-Striped-Smith-Waterman-Library
[20] https://github.com/Martinsos/opal
[21] Schmidt, J., All highest scoring paths in weighted graphs and their applications to finding all approximate repeats in strings, SIAM J. Comput., 27, 972-992 (1998) · Zbl 0907.68076 · doi:10.1137/S0097539795288489
[22] Smith, T.; Waterman, M., Identification of common molecular subsequences, J. Mol. Biol., 147, 1, 195-197 (1981) · doi:10.1016/0022-2836(81)90087-5
[23] States, D.; Gish, W.; Altschul, S., Improved sensitivity of nucleic acid database searches using application-specific scoring matrices, METHODS Companion Meth. Enzymol., 3, 66-70 (1991) · doi:10.1016/S1046-2023(05)80165-3
[24] Tiskin, A.: Semi-local string comparison: algorithmic techniques and applications, ch 6.1 (2013). https://arxiv.org/abs/0707.3619, v21 · Zbl 1158.68054
[25] Tiskin, A., Fast distance multiplication of unit-monge matrices, Algorithmica, 71, 4, 859-888 (2015) · Zbl 1325.68259 · doi:10.1007/s00453-013-9830-z
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.