skip to main content
research-article

Efficient Enumeration of Recursive Plans in Transformation-Based Query Optimizers

Published: 30 August 2024 Publication History

Abstract

Query optimizers built on the transformation-based Volcano/Cascades framework are used in many database systems. Transformations proposed earlier on the logical query dag (LQDAG) data structure, which is key in such a framework, are restricted to recursion-free queries. We propose the recursive logical query dag (RLQDAG) which extends the LQDAG with the ability to capture and transform recursive queries, leveraging recent developments in recursive relational algebra. Specifically, this extension includes: (i) the ability of capturing and transforming sets of recursive relational terms thanks to (ii) annotated equivalence nodes used for guiding transformations that are more complex in the presence of recursion; and (iii) RLQDAG rewrite rules that transform sets of subterms in a grouped manner, instead of transforming individual terms in a sequential manner; and that (iv) incrementally update the necessary annotations. Core concepts of the RLQDAG are formalized using a syntax and formal semantics with a particular focus on subterm sharing and recursion. The result is a clean generalization of the LQDAG, enabling efficient explorations of plan spaces for recursive queries. An implementation of the proposed approach shows significant performance gains compared to the state-of-the-art.

References

[1]
2023. MySQL. https://www.mysql.com/.
[2]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases: The Logical Level (1st ed.). Addison-Wesley Longman Publishing Co., Inc., USA.
[3]
Zahid Abul-Basher, Nikolay Yakovets, Parke Godfrey, Shadi Ghajar-Khosravi, and Mark H Chignell. 2017. Tasweet: optimizing disjunctive regular path queries in graph databases. In EDBT/ICDT 2017 joint conference 20th international conference on extending database technology. 470--473.
[4]
R. Agrawal. 1988. Alpha: an extension of relational algebra to express a class of recursive queries. IEEE Transactions on Software Engineering 14, 7 (July 1988), 879--885.
[5]
Alfred V. Aho and Jeffrey D. Ullman. 1979. Universality of Data Retrieval Languages. In Proceedings of the 6th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (San Antonio, Texas) (POPL '79). ACM, New York, NY, USA, 110--119.
[6]
Airbnb. 2022. Airbnb. http://insideairbnb.com/get-the-data.html.
[7]
Peter Alvaro, William R. Marczak, Neil Conway, Joseph M. Hellerstein, David Maier, and Russell Sears. 2011. Dedalus: Datalog in Time and Space. In Datalog Reloaded - First International Workshop, Datalog 2010, Oxford, UK, March 16--19, 2010. Revised Selected Papers (Lecture Notes in Computer Science), Oege de Moor, Georg Gottlob, Tim Furche, and Andrew Jon Sellers (Eds.), Vol. 6702. Springer Berlin Heidelberg, Berlin, Heidelberg, 262--281.
[8]
Bahamas-Leaks. 2016. Bahamas Leaks. https://offshoreleaks.icij.org/pages/about.
[9]
Francois Bancilhon, David Maier, Yehoshua Sagiv, and Jeffrey D Ullman. 1985. Magic Sets and Other Strange Ways to Implement Logic Programs (Extended Abstract). In Proceedings of the Fifth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (Cambridge, Massachusetts, USA) (PODS '86). Association for Computing Machinery, New York, NY, USA, 1--15.
[10]
Francois Bancilhon and Raghu Ramakrishnan. 1986. An Amateur's Introduction to Recursive Query Processing Strategies. In Proceedings of the 1986 ACM SIGMOD International Conference on Management of Data (Washington, D.C., USA) (SIGMOD '86). Association for Computing Machinery, New York, NY, USA, 16--52.
[11]
Peter Boncz. 2013. LDBC: Benchmarks for Graph and RDF Data Management. In Proceedings of the 17th International Database Engineering ; Applications Symposium (Barcelona, Spain) (IDEAS '13). Association for Computing Machinery, New York, NY, USA, 1--2.
[12]
Surajit Chaudhuri. 1998. An Overview of Query Optimization in Relational Systems. In Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (Seattle, Washington, USA) (PODS '98). Association for Computing Machinery, New York, NY, USA, 34--43.
[13]
Surajit Chaudhuri and Kyuseok Shim. 1994. Including group-by in query optimization. In VLDB, Vol. 94. 12--15.
[14]
E. F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Commun. ACM 13, 6 (June 1970), 377--387.
[15]
Orri Erling. 2012. Virtuoso, a Hybrid RDBMS/Graph Column Store. IEEE Data Eng. Bull. 35 (2012), 3--8.
[16]
Zhiwei Fan, Jianqiao Zhu, Zuyu Zhang, Aws Albarghouthi, Paraschos Koutris, and Jignesh M. Patel. 2019. Scaling-Up In-Memory Datalog Processing: Observations and Techniques. Proc. VLDB Endow. 12, 6 (2019), 695--708.
[17]
Pit Fender and Guido Moerkotte. 2013. Counter Strike: Generic Top-down Join Enumeration for Hypergraphs. Proc. VLDB Endow. 6, 14 (sep 2013), 1822--1833.
[18]
Pit Fender and Guido Moerkotte. 2013. Top down plan generation: From theory to practice. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). 1105--1116.
[19]
Michael Fire and Yuval Elovici. 2015. Data mining of online genealogy datasets for revealing lifespan patterns in human population. ACM Transactions on Intelligent Systems and Technology (TIST) 6, 2 (2015), 28.
[20]
Matthew Francis-Landau, Tim Vieira, and Jason Eisner. 2020. Evaluation of Logic Programs with Built-Ins and Aggregation: A Calculus for Bag Relations. CoRR abs/2010.10503 (2020), 1--27. arXiv:2010.10503 https://arxiv.org/abs/2010.10503
[21]
César Galindo-Legaria and Arnon Rosenthal. 1992. How to extend a conventional optimizer to handle one-and two-sided outerjoin. In 1992 Eighth International Conference on Data Engineering. IEEE Computer Society, 402--403.
[22]
Cesar Galindo-Legaria and Arnon Rosenthal. 1997. Outerjoin simplification and reordering for query optimization. ACM Transactions on Database Systems (TODS) 22, 1 (1997), 43--74.
[23]
Georges Gardarin. 1987. Magic Functions: A Technique to Optimize Extended Datalog Recursive Programs. 21--30.
[24]
Roy Goldman and Jennifer Widom. 1997. Dataguides: Enabling query formulation and optimization in semistructured databases. Technical Report. Stanford.
[25]
Goetz Graefe. 1995. The Cascades Framework for Query Optimization. IEEE Data Eng. Bull. 18, 3 (1995), 19--29. http://sites.computer.org/debull/95SEP-CD.pdf
[26]
Goetz Graefe and David J. DeWitt. 1987. The EXODUS Optimizer Generator. In Proceedings of the 1987 ACM SIGMOD International Conference on Management of Data (San Francisco, California, USA) (SIGMOD '87). ACM, New York, NY, USA, 160--172.
[27]
Goetz Graefe and William J. McKenna. 1993. The Volcano Optimizer Generator: Extensibility and Efficient Search. In Proceedings of the Ninth International Conference on Data Engineering. IEEE Computer Society, Washington, DC, USA, 209--218. http://dl.acm.org/citation.cfm?id=645478.757691
[28]
Andrey Gubichev, Srikanta J Bedathur, and Stephan Seufert. 2013. Sparqling kleene: fast property paths in RDF-3X. In First International Workshop on Graph Data Management Experiences and Systems. 1--7.
[29]
L. M. Haas, J. C. Freytag, G. M. Lohman, and H. Pirahesh. 1989. Extensible Query Processing in Starburst. In Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data (Portland, Oregon, USA) (SIGMOD '89). Association for Computing Machinery, New York, NY, USA, 377--388.
[30]
Richard D Hipp. 2023. SQLite. https://www.sqlite.org.
[31]
Shan Shan Huang, Todd Jeffrey Green, and Boon Thau Loo. 2011. Datalog and Emerging Applications: An Interactive Tutorial. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (Athens, Greece) (SIGMOD '11). Association for Computing Machinery, New York, NY, USA, 1213--1216.
[32]
Louis Jachiet, Pierre Genevès, Nils Gesbert, and Nabil Layaïda. 2020. On the Optimization of Recursive Relational Queries: Application to Graph Queries. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 681--697.
[33]
Herbert Jordan, Bernhard Scholz, and Pavle Subotic. 2016. Soufflé: On Synthesis of Program Analyzers. In Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17--23, 2016, Proceedings, Part II (Lecture Notes in Computer Science), Swarat Chaudhuri and Azadeh Farzan (Eds.), Vol. 9780. Springer, 422--430.
[34]
Michael Kifer and Eliezer L. Lozinskii. 1990. On Compile-Time Query Optimization in Deductive Databases by Means of Static Filtering. ACM Trans. Database Syst. 15, 3 (sep 1990), 385--426.
[35]
André Koschmieder and Ulf Leser. 2012. Regular path queries on large graphs. In International Conference on Scientific and Statistical Database Management. Springer, 177--194.
[36]
Muideen Lawal. 2021. On Cost Estimation for the Recursive Relational Algebra. Theses. Université Grenoble Alpes [2020-....]. https://theses.hal.science/tel-03322720
[37]
Muideen Lawal, Pierre Genevès, and Nabil Layaïda. 2020. A Cost Estimation Technique for Recursive Relational Algebra. Proceedings of the 29th ACM International Conference on Information & Knowledge Management (2020), 3297--3300.
[38]
Jean F Liénard, Titipat Achakulvisut, Daniel E Acuna, and Stephen V David. 2018. Intellectual synthesis in mentorship determines success in academic careers. Nature communications 9, 1 (2018), 1--13.
[39]
Guy M. Lohman. 1988. Grammar-like Functional Rules for Representing Query Optimization Alternatives. SIGMOD Rec. 17, 3 (June 1988), 18--27.
[40]
Guy M. Lohman, C. Mohan, Laura M. Haas, Dean Daniels, Bruce G. Lindsay, Patricia G. Selinger, and Paul F. Wilms. 1985. Query Processing in R*. Springer Berlin Heidelberg, Berlin, Heidelberg, 31--47.
[41]
Guido Moerkotte, Pit Fender, and Marius Eich. 2013. On the correct and complete enumeration of the core search space. In Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. 493--504.
[42]
J. F. Naughton, R. Ramakrishnan, Y. Sagiv, and J. D. Ullman. 1989. Efficient Evaluation of Right-, Left-, and Multi-Linear Rules. In Proceedings of the 1989 ACM SIGMOD International Conference on Management of Data (Portland, Oregon, USA) (SIGMOD '89). Association for Computing Machinery, New York, NY, USA, 235--242.
[43]
Prasan Roy, S. Seshadri, S. Sudarshan, and Siddhesh Bhobe. 2000. Efficient and Extensible Algorithms for Multi Query Optimization. SIGMOD Rec. 29, 2 (May 2000), 249--260.
[44]
Domenico Saccà and Carlo Zaniolo. 1985. On the Implementation of a Simple Class of Logic Queries for Databases. In Proceedings of the Fifth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (Cambridge, Massachusetts, USA) (PODS '86). Association for Computing Machinery, New York, NY, USA, 16--23.
[45]
Maximilian Schleich, Amir Shaikhha, and Dan Suciu. 2023. Optimizing Tensor Programs on Flexible Storage. Proc. ACM Manag. Data 1, 1 (2023), 37:1--37:27.
[46]
P. Griffiths Selinger, M. M. Astrahan, D. D. Chamberlin, R. A. Lorie, and T. G. Price. 1979. Access Path Selection in a Relational Database Management System. In Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data (Boston, Massachusetts) (SIGMOD '79). ACM, New York, NY, USA, 23--34.
[47]
Jiwon Seo, Stephen Guo, and Monica S. Lam. 2015. SociaLite: An Efficient Graph Query Language Based on Datalog. IEEE Transactions on Knowledge and Data Engineering 27, 7 (2015), 1824--1837.
[48]
Anil Shanbhag and S. Sudarshan. 2014. Optimizing Join Enumeration in Transformation-Based Query Optimizers. Proc. VLDB Endow. 7, 12 (aug 2014), 1243--1254.
[49]
Leonard Shapiro, David Maier, Paul Benninghoff, Keith Billings, Yubo Fan, Kavita Hatwal, Quan Wang, Yu Zhang, H.-M Wu, and Bennet Vance. 2001. Exploiting Upper and Lower Bounds in Top-Down Query Optimization. 20--33.
[50]
Chandan Sharma, Roopak Sinha, and Kenneth Johnson. 2021. Practical and comprehensive formalisms for modelling contemporary graph query languages. Information Systems 102 (2021), 101816.
[51]
Alexander Shkapsky, Mohan Yang, Matteo Interlandi, Hsuan Chiu, Tyson Condie, and Carlo Zaniolo. 2016. Big Data Analytics with Datalog Queries on Spark. In Proceedings of the 2016 International Conference on Management of Data (San Francisco, California, USA) (SIGMOD '16). Association for Computing Machinery, New York, NY, USA, 1135--1149.
[52]
Michael Stonebraker and Lawrence A. Rowe. 1986. The Design of POSTGRES (SIGMOD '86). Association for Computing Machinery, New York, NY, USA, 340--355.
[53]
Ross Tate, Michael Stepp, Zachary Tatlock, and Sorin Lerner. 2011. Equality Saturation: A New Approach to Optimization. Log. Methods Comput. Sci. 7, 1 (2011), 1--37.
[54]
K. Tuncay Tekle and Yanhong A. Liu. 2011. More Efficient Datalog Queries: Subsumptive Tabling Beats Magic Sets. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (Athens, Greece) (SIGMOD '11). Association for Computing Machinery, New York, NY, USA, 661--672.
[55]
Bryan Thompson, Mike Personick, and Martyn Cutcher. 2016. The Bigdata® RDF graph database. In Linked Data Management. Chapman and Hall/CRC, 221--266.
[56]
Tyrex-repository. 2023. Datasets and Queries used in experiments with the RLQDAG. https://gitlab.inria.fr/tyrex-public/rlqdag.
[57]
Victor Vianu. 2021. Datalog Unchained. In PODS'21: Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Virtual Event, China, June 20--25, 2021, Leonid Libkin, Reinhard Pichler, and Paolo Guagliardo (Eds.). ACM, 57--69.
[58]
Domagoj Vrgoč, Carlos Rojas, Renzo Angles, Marcelo Arenas, Diego Arroyuelo, Carlos Buil-Aranda, Aidan Hogan, Gonzalo Navarro, Cristian Riveros, and Juan Romero. 2022. MillenniumDB: An Open-Source Graph Database System. Data Intelligence (2022), 1--39.
[59]
Jingjing Wang, Magdalena Balazinska, and Daniel Halperin. 2015. Asynchronous and Fault-Tolerant Recursive Datalog Evaluation in Shared-Nothing Engines. Proc. VLDB Endow. 8, 12 (aug 2015), 1542--1553.
[60]
Yisu Remy Wang, Shana Hutchison, Dan Suciu, Bill Howe, and Jonathan Leang. 2020. SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra. Proc. VLDB Endow. 13, 11 (2020), 1919--1932. http://www.vldb.org/pvldb/vol13/p1919-wang.pdf
[61]
Yisu Remy Wang, Mahmoud Abo Khamis, Hung Q Ngo, Reinhard Pichler, and Dan Suciu. 2022. Optimizing Recursive Queries with Program Synthesis. arXiv preprint arXiv:2202.10390 (2022), 1--23.
[62]
Jim Webber. 2012. A Programmatic Introduction to Neo4j. In Proceedings of the 3rd Annual Conference on Systems, Programming, and Applications: Software for Humanity (Tucson, Arizona, USA) (SPLASH '12). Association for Computing Machinery, New York, NY, USA, 217--218.
[63]
Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. egg: Fast and extensible equality saturation. Proc. ACM Program. Lang. 5, POPL (2021), 1--29.
[64]
YAGO. 2019. YAGO: A high-quality knowledge base. https://www.mpi-inf.mpg.de/yago-naga/yago/.
[65]
Nikolay Yakovets, Parke Godfrey, and Jarek Gryz. 2015. WAVEGUIDE: Evaluating SPARQL Property Path Queries. In EDBT, Vol. 2015. 525--528.
[66]
Harald Zauner, Benedikt Linse, Tim Furche, and François Bry. 2010. A RPL through RDF: Expressive navigation in RDF graphs. In International Conference on Web Reasoning and Rule Systems. Springer, 251--257.

Index Terms

  1. Efficient Enumeration of Recursive Plans in Transformation-Based Query Optimizers
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 17, Issue 11
    July 2024
    1039 pages
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 30 August 2024
    Published in PVLDB Volume 17, Issue 11

    Check for updates

    Badges

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 10
      Total Downloads
    • Downloads (Last 12 months)10
    • Downloads (Last 6 weeks)9
    Reflects downloads up to 22 Oct 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media