×

Tree compression with top trees. (English) Zbl 1327.68085

Summary: We introduce a new compression scheme for labeled trees based on top trees. Our compression scheme is the first to simultaneously take advantage of internal repeats in the tree (as opposed to the classical DAG compression that only exploits rooted subtree repeats) while also supporting fast navigational queries directly on the compressed representation. We show that the new compression scheme achieves close to optimal worst-case compression, can compress exponentially better than DAG compression, is never much worse than DAG compression, and supports navigational queries in logarithmic time.

MSC:

68P05 Data structures
68P10 Searching and sorting
68P30 Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)

Software:

TreeRePair; XMill

References:

[1] Buneman, P.; Grohe, M.; Koch, C., Path queries on compressed XML, (Proc. 29th VLDB (2003)), 141-152
[2] Frick, M.; Grohe, M.; Koch, C., Query evaluation on compressed trees, (Proc. 18th LICS (2003)), 188-197
[3] Busatto, G.; Lohrey, M.; Maneth, S., Grammar-based tree compression (2004), Tech. rep., EPFL
[4] Busatto, G.; Lohrey, M.; Maneth, S., Efficient memory representation of XML document trees, Inf. Syst., 33, 4-5, 456-474 (2008)
[5] Lohrey, M.; Maneth, S.; Mennicke, R., Tree structure compression with repair, Arxiv preprint
[6] Lohrey, M.; Maneth, S., The complexity of tree automata and XPath on grammar-compressed trees, Theor. Comput. Sci., 363, 2, 196-210 (2006) · Zbl 1153.68402
[7] Maneth, S.; Busatto, G., Tree transducers and tree compressions, (Proc. 7th FOSSACS (2004)), 363-377 · Zbl 1126.68449
[8] Sakr, S., XML compression techniques: a survey and comparison, J. Comput. Syst. Sci., 75, 5, 303-322 (2009) · Zbl 1182.68057
[9] Downey, P. J.; Sethi, R.; Tarjan, R. E., Variations on the common subexpression problem, J. ACM, 27, 758-771 (1980) · Zbl 0458.68026
[10] Muchnick, S. S., Advanced Compiler Design Implementation (1997), Morgan Kaufmann
[11] Meinel, C.; Theobald, T., Algorithms and Data Structures in VLSI Design: OBDD-Foundations and Applications (1998), Springer · Zbl 0899.68040
[12] Lohrey, M.; Maneth, S.; Noeth, E., XML compression via DAGs, (Proceedings of the 16th International Conference on Database Theory (2013), ACM), 69-80
[13] Adiego, J.; Navarro, G.; de la Fuente, P., Lempel-Ziv compression of highly structured documents, J. Am. Soc. Inf. Sci. Technol., 58, 4, 461-478 (2007)
[14] Bille, P.; Landau, G.; Raman, R.; Rao, S.; Sadakane, K.; Weimann, O., Random access to grammar-compressed strings, (Proc. 22nd SODA (2011)), 373-389 · Zbl 1375.68229
[15] Charikar, M.; Lehman, E.; Lehman, A.; Liu, D.; Panigrahy, R.; Prabhakaran, M.; Sahai, A.; Shelat, A., The smallest grammar problem, IEEE Trans. Inf. Theory, 51, 7, 2554-2576 (2005) · Zbl 1296.68086
[16] Jacobson, G., Space-efficient static trees and graphs, (Proc. 30th FOCS (1989)), 549-554
[17] Munro, J. I.; Raman, V., Succinct representation of balanced parentheses and static trees, SIAM J. Comput., 31, 3, 762-776 (2001) · Zbl 1017.68037
[18] Benoit, D.; Demaine, E.; Munro, I.; Raman, R.; Raman, V.; Rao, S., Representing trees of higher degree, Algorithmica, 43, 275-292 (2005) · Zbl 1086.68034
[19] Geary, R.; Raman, R.; Raman, V., Succinct ordinal trees with level-ancestor queries, (Proc. 15th SODA (2004)), 1-10 · Zbl 1317.68043
[20] Ferragina, P.; Luccio, F.; Manzini, G.; Muthukrishnan, S., Compressing and indexing labeled trees, with applications, J. ACM, 57, 1-33 (2009) · Zbl 1326.68132
[21] Cheney, J., Compressing XML with multiplexed hierarchical PPM models, (Proc. 11th IEEE Data Compression Conf.. Proc. 11th IEEE Data Compression Conf., DCC (2001)), 163-172
[22] Liefke, H.; Suciu, D., Xmill: an efficient compressor for XML data, (Proc. 2000 ACM SIGMOD Int. Conf. on Manag. of Data. Proc. 2000 ACM SIGMOD Int. Conf. on Manag. of Data, SIGMOD (2000)), 153-164
[23] Alstrup, S.; Holm, J.; de Lichtenberg, K.; Thorup, M., Minimizing diameters of dynamic trees, (Proc. 24th ICALP (1997)), 270-280 · Zbl 1401.68240
[24] Alstrup, S.; Holm, J.; Thorup, M., Maintaining center and median in dynamic trees, (Proc. 7th SWAT (2000)), 46-56 · Zbl 0966.68510
[25] Alstrup, S.; Holm, J.; Lichtenberg, K. D.; Thorup, M., Maintaining information in fully-dynamic trees with top trees, ACM Trans. Algorithms, 1, 243-264 (2003) · Zbl 1321.68211
[26] Gasieniec, L.; Karpinski, M.; Plandowski, W.; Rytter, W., Efficient algorithms for Lempel-Ziv encoding, (Proc. 4th SWAT (1996)), 392-403 · Zbl 1502.68379
[27] Rytter, W., Grammar compression, LZ-encodings, and string algorithms with implicit input, (Proc. 31st ICALP (2004)), 15-27 · Zbl 1099.68028
[28] Lohrey, M., Algorithmics on SLP-compressed strings: a survey, Groups Complex. Cryptol., 4, 2, 241-299 (2012) · Zbl 1285.68088
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.