×

Reconstructing trees from traces. (English) Zbl 1530.68209

Summary: We study the problem of learning a node-labeled tree given independent traces from an appropriately defined deletion channel. This problem, tree trace reconstruction, generalizes string trace reconstruction, which corresponds to the tree being a path. For many classes of trees, including complete trees and spiders, we provide algorithms that reconstruct the labels using only a polynomial number of traces. This exhibits a stark contrast to known results on string trace reconstruction, which require exponentially many traces, and where a central open problem is to determine whether a polynomial number of traces suffice. Our techniques combine novel combinatorial and complex analytic methods.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68Q25 Analysis of algorithms and problem complexity
94A40 Channel models (including quantum) in information and communication theory

References:

[1] Abrahao, B., Chierichetti, F., Kleinberg, R. and Panconesi, A. (2013). Trace complexity of network inference. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 491-499. ACM, New York.
[2] Ahlfors, L. V. (1953). Complex Analysis. An Introduction to the Theory of Analytic Functions of One Complex Variable. McGraw-Hill Book Company, Inc., New York-Toronto-London. · Zbl 0052.07002
[3] Anavy, L., Vaknin, I., Atar, O., Amit, R. and Yakhini, Z. (2019). Data storage in DNA with fewer synthesis cycles using composite DNA letters. Nat. Biotechnol. 37 1229-1236. · doi:10.1038/s41587-019-0240-x
[4] Andoni, A., Daskalakis, C., Hassidim, A. and Roch, S. (2012). Global alignment of molecular sequences via ancestral state reconstruction. Stochastic Process. Appl. 122 3852-3874. · Zbl 1250.92034 · doi:10.1016/j.spa.2012.08.004
[5] Ban, F., Chen, X., Freilich, A., Servedio, R. A. and Sinha, S. (2019). Beyond trace reconstruction: Population recovery from the deletion channel. In Proceedings of the 60th Annual Symposium on Foundations of Computer Science (FOCS) 745-768. IEEE, New York.
[6] Batu, T., Kannan, S., Khanna, S. and McGregor, A. (2004). Reconstructing strings from random traces. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms 910-918. ACM, New York. · Zbl 1318.68206
[7] Bille, P. (2005). A survey on tree edit distance and related problems. Theoret. Comput. Sci. 337 217-239. · Zbl 1078.68152 · doi:10.1016/j.tcs.2004.12.030
[8] Borwein, P. and Erdélyi, T. (1997). Littlewood-type problems on subarcs of the unit circle. Indiana Univ. Math. J. 46 1323-1346. · Zbl 0930.30005 · doi:10.1512/iumj.1997.46.1435
[9] Ceze, L., Nivala, J. and Strauss, K. (2019). Molecular digital data storage using DNA. Nat. Rev. Genet. 20 456-466. · doi:10.1038/s41576-019-0125-3
[10] Chase, Z. (2019). New lower bounds for trace reconstruction. Preprint. Available at https://arxiv.org/abs/1905.03031. · Zbl 1469.62189
[11] Cheraghchi, M., Gabrys, R., Milenkovic, O. and Ribeiro, J. (2020). Coded trace reconstruction. IEEE Trans. Inf. Theory 66 6084-6103. · Zbl 1452.94041 · doi:10.1109/TIT.2020.2996377
[12] Church, G. M., Gao, Y. and Kosuri, S. (2012). Next-generation digital information storage in DNA. Science 337 1628.
[13] Davies, S., Rácz, M. Z. and Rashtchian, C. (2019). Reconstructing trees from traces. In Proceedings of the 32nd Conference on Learning Theory (COLT) 961-978.
[14] De Anindya, A., O’Donnell, R. and Servedio, R. A. (2017). Optimal mean-based algorithms for trace reconstruction. In STOC’17—Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing 1047-1056. ACM, New York. · Zbl 1369.68202 · doi:10.1145/3055399.3055450
[15] De Anindya, A., O’Donnell, R. and Servedio, R. A. (2019). Optimal mean-based algorithms for trace reconstruction. Ann. Appl. Probab. 29 851-874. · Zbl 1416.62196 · doi:10.1214/18-AAP1394
[16] Dudík, M. and Schulman, L. J. (2003). Reconstruction from subsequences. J. Combin. Theory Ser. A 103 337-348. · Zbl 1039.68098 · doi:10.1016/S0097-3165(03)00103-1
[17] Erlich, Y. and Zielinski, D. (2017). DNA fountain enables a robust and efficient storage architecture. Science 355 950-954. · doi:10.1126/science.aaj2038
[18] Garnett, J. B. and Marshall, D. E. (2005). Harmonic Measure. New Mathematical Monographs 2. Cambridge Univ. Press, Cambridge. · Zbl 1077.31001 · doi:10.1017/CBO9780511546617
[19] Goldman, N., Bertone, P., Chen, S., Dessimoz, C., LeProust, E. M., Sipos, B. and Birney, E. (2013). Towards practical, high-capacity, low-maintenance information storage in synthesized DNA. Nature 494 77-80.
[20] Hartung, L., Holden, N. and Peres, Y. (2018). Trace reconstruction with varying deletion probabilities. In 2018 Proceedings of the Fifteenth Workshop on Analytic Algorithmics and Combinatorics (ANALCO) 54-61. SIAM, Philadelphia, PA. · Zbl 1430.68101 · doi:10.1137/1.9781611975062.6
[21] Holden, N. and Lyons, R. (2020). Lower bounds for trace reconstruction. Ann. Appl. Probab. 30 503-525. · Zbl 1445.62014 · doi:10.1214/19-AAP1506
[22] Holden, N., Pemantle, R. and Peres, Y. (2018). Subpolynomial trace reconstruction for random strings and arbitrary deletion probability. In Proceedings of the 31st Conference on Learning Theory (COLT) 1799-1840. · Zbl 1472.68223
[23] Holenstein, T., Mitzenmacher, M., Panigrahy, R. and Wieder, U. (2008). Trace reconstruction with constant deletion probability and related results. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms 389-398. ACM, New York. · Zbl 1192.94064
[24] Karau, P. and Tabard-Cossa, V. (2018). Capture and translocation characteristics of short branched DNA labels in solid-state nanopores. ACS Sens 3 1308-1315. · doi:10.1021/acssensors.8b00165
[25] Kelly, P. J. (1957). A congruence theorem for trees. Pacific J. Math. 7 961-968. · Zbl 0078.37103
[26] Krasikov, I. and Roditty, Y. (1997). On a reconstruction problem for sequences. J. Combin. Theory Ser. A 77 344-348. · Zbl 0871.05002 · doi:10.1006/jcta.1997.2732
[27] Krishnamurthy, A., Mazumdar, A., McGregor, A. and Pal, S. (2019). Trace reconstruction: Generalized and parameterized. In 27th Annual European Symposium on Algorithms. LIPIcs. Leibniz Int. Proc. Inform. 144 Art. No. 68, 25. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern. · Zbl 1473.94054
[28] Lauri, J. and Scapellato, R. (2016). Topics in Graph Automorphisms and Reconstruction, 2nd ed. London Mathematical Society Lecture Note Series 432. Cambridge Univ. Press, Cambridge. · Zbl 1347.05001 · doi:10.1017/CBO9781316669846
[29] Levenshtein, V. I. (2001). Efficient reconstruction of sequences from their subsequences or supersequences. J. Combin. Theory Ser. A 93 310-332. · Zbl 0992.68155 · doi:10.1006/jcta.2000.3081
[30] Maranzatto, T. J. (2020). Tree trace reconstruction: Some results. Thesis, New College of Florida.
[31] McGregor, A., Price, E. and Vorotnikova, S. (2014). Trace reconstruction revisited. In Algorithms—ESA 2014. Lecture Notes in Computer Science 8737 689-700. Springer, Heidelberg. · Zbl 1425.68470 · doi:10.1007/978-3-662-44777-2_57
[32] Mitzenmacher, M. (2009). A survey of results for deletion channels and related synchronization channels. Probab. Surv. 6 1-33. · Zbl 1189.94058 · doi:10.1214/08-PS141
[33] Mossel, E. and Ross, N. (2019). Shotgun assembly of labeled graphs. IEEE Trans. Netw. Sci. Eng. 6 145-157. · doi:10.1109/TNSE.2017.2776913
[34] Nazarov, F. and Peres, Y. (2017). Trace reconstruction with \[\exp (O({n^{1/3}}))\] samples. In STOC’17—Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing 1042-1046. ACM, New York. · Zbl 1370.68087
[35] Organick, L., Ang, S. D., Chen, Y.-J., Lopez, R., Yekhanin, S., Makarychev, K., Racz, M. Z., Kamath, G., Gopalan, P. et al. (2018). Random access in large-scale DNA data storage. Nat. Biotechnol. 36 242-248.
[36] Ulam, S. M. (1960). A Collection of Mathematical Problems. Interscience Tracts in Pure and Applied Mathematics, No. 8. Interscience Publishers, New York. · Zbl 0086.24101
[37] Viswanathan, K. and Swaminathan, R. (2008). Improved string reconstruction over insertion-deletion channels. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms 399-408. ACM, New York. · Zbl 1192.94072
[38] Yazdi, S. H. T., Gabrys, R. and Milenkovic, O. (2017). Portable and error-free DNA-based data storage. Sci. Rep. 7 5011.
[39] Yazdi, S. H. T., Kiah, H. M., Garcia-Ruiz, E., Ma, J., Zhao, H. and Milenkovic, O. (2015). DNA-based storage: Trends and methods. IEEE Transactions on Molecular, Biological and Multi-Scale Communications 1 230-248.
[40] Zhang, K. and Shasha, D. (1989). Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18 1245-1262 · Zbl 0692.68047 · doi:10.1137/0218082
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.