Skip to main content

Showing 1–50 of 78 results for author: Suciu, D

  1. arXiv:2408.14706  [pdf, other

    cs.DB cs.PL

    Galley: Modern Query Optimization for Sparse Tensor Programs

    Authors: Kyle Deeds, Willow Ahrens, Magda Balazinska, Dan Suciu

    Abstract: The tensor programming abstraction has become a foundational paradigm for modern computing. This framework allows users to write high performance programs for bulk computation via a high-level imperative interface. Recent work has extended this paradigm to sparse tensors (i.e. tensors where most entries are not explicitly represented) with the use of sparse tensor compilers. These systems excel at… ▽ More

    Submitted 31 August, 2024; v1 submitted 26 August, 2024; originally announced August 2024.

  2. arXiv:2408.07494  [pdf, other

    cs.DB cs.LG

    QirK: Question Answering via Intermediate Representation on Knowledge Graphs

    Authors: Jan Luca Scheerer, Anton Lykov, Moe Kayali, Ilias Fountalis, Dan Olteanu, Nikolaos Vasiloglou, Dan Suciu

    Abstract: We demonstrate QirK, a system for answering natural language questions on Knowledge Graphs (KG). QirK can answer structurally complex questions that are still beyond the reach of emerging Large Language Models (LLMs). It does so using a unique combination of database technology, LLMs, and semantic search over vector embeddings. The glue for these components is an intermediate representation (IR).… ▽ More

    Submitted 14 August, 2024; originally announced August 2024.

  3. Perceptions of Entrepreneurship Among Graduate Students: Challenges, Opportunities, and Cultural Biases

    Authors: Manuela Andreea Petrescu, Dan Mircea Suciu

    Abstract: The purpose of the paper is to examine the perceptions of entrepreneurship of graduate students enrolled in a digital-oriented entrepreneurship course, focusing on the challenges and opportunities related to starting a business. In today's digital era, businesses heavily depend on tailored software solutions to facilitate their operational processes, foster expansion, and enhance their competitive… ▽ More

    Submitted 10 May, 2024; originally announced July 2024.

    Comments: In Proceedings of the 16th International Conference on Computer Supported Education - Volume 1, ISBN 978-989-758-697-2, ISSN 2184-5026, pages 347-354

    ACM Class: K.3.2; J.4

  4. arXiv:2405.06767  [pdf, other

    cs.DB

    Color: A Framework for Applying Graph Coloring to Subgraph Cardinality Estimation

    Authors: Kyle Deeds, Diandre Sabale, Moe Kayali, Dan Suciu

    Abstract: Graph workloads pose a particularly challenging problem for query optimizers. They typically feature large queries made up of entirely many-to-many joins with complex correlations. This puts significant stress on traditional cardinality estimation methods which generally see catastrophic errors when estimating the size of queries with only a handful of joins. To overcome this, we propose COLOR, a… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  5. arXiv:2402.02001  [pdf, ps, other

    cs.DB cs.IT

    PANDA: Query Evaluation in Submodular Width

    Authors: Mahmoud Abo Khamis, Hung Q. Ngo, Dan Suciu

    Abstract: In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations and the query plans that they support. This motivated the search for algorithms that can compute the output of a join query in times that are bounded by the corr… ▽ More

    Submitted 13 September, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

  6. arXiv:2401.16210  [pdf, ps, other

    math.CO cs.DM

    The Non-Cancelling Intersections Conjecture

    Authors: Antoine Amarilli, Mikaël Monet, Dan Suciu

    Abstract: In this note, we present a conjecture on intersections of set families, and a rephrasing of the conjecture in terms of principal downsets of Boolean lattices. The conjecture informally states that, whenever we can express the measure of a union of sets in terms of the measure of some of their intersections using the inclusion-exclusion formula, then we can express the union as a set from these sam… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: 30 pages

  7. arXiv:2312.09331  [pdf, ps, other

    cs.DB

    Insert-Only versus Insert-Delete in Dynamic Query Evaluation

    Authors: Mahmoud Abo Khamis, Ahmet Kara, Dan Olteanu, Dan Suciu

    Abstract: We study the dynamic query evaluation problem: Given a full conjunctive query Q and a sequence of updates to the input database, we construct a data structure that supports constant-delay enumeration of the tuples in the query output after each update. We show that a sequence of N insert-only updates to an initially empty database can be executed in total time O(N^w(Q)), where w(Q) is the fracti… ▽ More

    Submitted 13 September, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

  8. arXiv:2309.12347  [pdf, other

    cs.CY cs.SE

    Transitioning a Project-Based Course between Onsite and Online. An Experience Report

    Authors: Dan Mircea Suciu, Simona Motogna, Arthur-Jozsef Molnar

    Abstract: We present an investigation regarding the challenges faced by student teams across four consecutive iterations of a team-focused, project-based course in software engineering. The studied period includes the switch to fully online activities in the spring of 2020, and covers the return to face-to-face teaching two years later. We cover the feedback provided by over 1,500 students, collected in a f… ▽ More

    Submitted 28 August, 2023; originally announced September 2023.

    Comments: 12 pages, 2 figures

  9. arXiv:2306.14211  [pdf, ps, other

    cs.DB cs.CC cs.LO

    From Shapley Value to Model Counting and Back

    Authors: Ahmet Kara, Dan Olteanu, Dan Suciu

    Abstract: In this paper we investigate the problem of quantifying the contribution of each variable to the satisfying assignments of a Boolean function based on the Shapley value. Our main result is a polynomial-time equivalence between computing Shapley values and model counting for any class of Boolean functions that are closed under substitutions of variables with disjunctions of fresh variables. This… ▽ More

    Submitted 25 June, 2023; originally announced June 2023.

    Comments: 22 pages

    ACM Class: F.4.1; F.2; H.2

  10. arXiv:2306.14075  [pdf, ps, other

    cs.DB cs.IT

    Join Size Bounds using Lp-Norms on Degree Sequences

    Authors: Mahmoud Abo Khamis, Vasileios Nakos, Dan Olteanu, Dan Suciu

    Abstract: Estimating the output size of a query is a fundamental yet longstanding problem in database query processing. Traditional cardinality estimators used by database systems can routinely underestimate the true output size by orders of magnitude, which leads to significant system performance penalty. Recently, upper bounds have been proposed that are based on information inequalities and incorporate s… ▽ More

    Submitted 5 June, 2024; v1 submitted 24 June, 2023; originally announced June 2023.

  11. arXiv:2306.09610  [pdf, other

    cs.DB cs.LG

    CHORUS: Foundation Models for Unified Data Discovery and Exploration

    Authors: Moe Kayali, Anton Lykov, Ilias Fountalis, Nikolaos Vasiloglou, Dan Olteanu, Dan Suciu

    Abstract: We apply foundation models to data discovery and exploration tasks. Foundation models include large language models (LLMs) that show promising performance on a range of diverse tasks unrelated to their training. We show that these models are highly applicable to the data discovery and data exploration domain. When carefully used, they have superior capability on three representative tasks: table-c… ▽ More

    Submitted 5 April, 2024; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: To appear in VLDB 2024

  12. arXiv:2304.11996  [pdf, other

    cs.DB cs.IT

    Applications of Information Inequalities to Database Theory Problems

    Authors: Dan Suciu

    Abstract: The paper describes several applications of information inequalities to problems in database theory. The problems discussed include: upper bounds of a query's output, worst-case optimal join algorithms, the query domination problem, and the implication problem for approximate integrity constraints. The paper is self-contained: all required concepts and results from information inequalities are int… ▽ More

    Submitted 4 June, 2024; v1 submitted 24 April, 2023; originally announced April 2023.

    Comments: This paper was invited for LICS'2023

  13. arXiv:2301.10841  [pdf, other

    cs.DB

    Free Join: Unifying Worst-Case Optimal and Traditional Joins

    Authors: Yisu Remy Wang, Max Willsey, Dan Suciu

    Abstract: Over the last decade, worst-case optimal join (WCOJ) algorithms have emerged as a new paradigm for one of the most fundamental challenges in query processing: computing joins efficiently. Such an algorithm can be asymptotically faster than traditional binary joins, all the while remaining simple to understand and implement. However, they have been found to be less efficient than the old paradigm,… ▽ More

    Submitted 27 January, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

  14. arXiv:2211.11912  [pdf, other

    cs.DS

    Quasi-stable Coloring for Graph Compression: Approximating Max-Flow, Linear Programs, and Centrality

    Authors: Moe Kayali, Dan Suciu

    Abstract: We propose quasi-stable coloring, an approximate version of stable coloring. Stable coloring, also called color refinement, is a well-studied technique in graph theory for classifying vertices, which can be used to build compact, lossless representations of graphs. However, its usefulness is limited due to its reliance on strict symmetries. Real data compresses very poorly using color refinement.… ▽ More

    Submitted 28 November, 2022; v1 submitted 21 November, 2022; originally announced November 2022.

    Comments: To be presented at VLDB 2023

  15. arXiv:2211.09864  [pdf, other

    cs.DB

    SafeBound: A Practical System for Generating Cardinality Bounds

    Authors: Kyle Deeds, Dan Suciu, Magda Balazinska

    Abstract: Recent work has reemphasized the importance of cardinality estimates for query optimization. While new techniques have continuously improved in accuracy over time, they still generally allow for under-estimates which often lead optimizers to make overly optimistic decisions. This can be very costly for expensive queries. An alternative approach to estimation is cardinality bounding, also called pe… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

  16. arXiv:2210.17071  [pdf, other

    cs.LG cs.DB

    Computing Rule-Based Explanations by Leveraging Counterfactuals

    Authors: Zixuan Geng, Maximilian Schleich, Dan Suciu

    Abstract: Sophisticated machine models are increasingly used for high-stakes decisions in everyday life. There is an urgent need to develop effective explanation techniques for such automated decisions. Rule-Based Explanations have been proposed for high-stake decisions like loan applications, because they increase the users' trust in the decision. However, rule-based explanations are very inefficient to co… ▽ More

    Submitted 31 October, 2022; originally announced October 2022.

  17. arXiv:2210.06267  [pdf, other

    cs.DB cs.PL

    Optimizing Tensor Programs on Flexible Storage

    Authors: Maximilian Schleich, Amir Shaikhha, Dan Suciu

    Abstract: Tensor programs often need to process large tensors (vectors, matrices, or higher order tensors) that require a specialized storage format for their memory layout. Several such layouts have been proposed in the literature, such as the Coordinate Format, the Compressed Sparse Row format, and many others, that were especially designed to optimally store tensors with specific sparsity properties. How… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

  18. arXiv:2207.08891  [pdf, other

    cs.CR

    Wink: Deniable Secure Messaging

    Authors: Anrin Chakraborti, Darius Suciu, Radu Sion

    Abstract: End-to-end encrypted (E2EE) messaging is an essential first step in providing message confidentiality. Unfortunately, all security guarantees of end-to-end encryption are lost when keys or plaintext are disclosed, either due to device compromise or (sometimes lawful) coercion by powerful adversaries. This work introduces Wink, the first plausibly-deniable messaging system protecting message confid… ▽ More

    Submitted 10 June, 2023; v1 submitted 18 July, 2022; originally announced July 2022.

  19. arXiv:2202.10390  [pdf, other

    cs.DB

    Optimizing Recursive Queries with Program Synthesis

    Authors: Yisu Remy Wang, Mahmoud Abo Khamis, Hung Q. Ngo, Reinhard Pichler, Dan Suciu

    Abstract: Most work on query optimization has concentrated on loop-free queries. However, data science and machine learning workloads today typically involve recursive or iterative computation. In this work, we propose a novel framework for optimizing recursive queries using methods from program synthesis. In particular, we introduce a simple yet powerful optimization rule called the "FGH-rule" which aims t… ▽ More

    Submitted 21 February, 2022; originally announced February 2022.

  20. arXiv:2201.04166  [pdf, other

    cs.DB

    Degree Sequence Bound For Join Cardinality Estimation

    Authors: Kyle Deeds, Dan Suciu, Magda Balazinska, Walter Cai

    Abstract: Recent work has demonstrated the catastrophic effects of poor cardinality estimates on query processing time. In particular, underestimating query cardinality can result in overly optimistic query plans which take orders of magnitude longer to complete than one generated with the true cardinality. Cardinality bounding avoids this pitfall by computing a strict upper bound on the query's output size… ▽ More

    Submitted 30 March, 2022; v1 submitted 11 January, 2022; originally announced January 2022.

  21. arXiv:2105.14435  [pdf, ps, other

    cs.DB

    Convergence of Datalog over (Pre-) Semirings

    Authors: Mahmoud Abo Khamis, Hung Q. Ngo, Reinhard Pichler, Dan Suciu, Yisu Remy Wang

    Abstract: Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this paper we study the convergence of datalog when it is interpreted over an arbitrary s… ▽ More

    Submitted 24 January, 2024; v1 submitted 30 May, 2021; originally announced May 2021.

  22. arXiv:2101.01292  [pdf, other

    cs.LG cs.DB

    GeCo: Quality Counterfactual Explanations in Real Time

    Authors: Maximilian Schleich, Zixuan Geng, Yihong Zhang, Dan Suciu

    Abstract: Machine learning is increasingly applied in high-stakes decision making that directly affect people's lives, and this leads to an increased demand for systems to explain their decisions. Explanations often take the form of counterfactuals, which consists of conveying to the end user what she/he needs to change in order to improve the outcome. Computing counterfactual explanations is challenging, b… ▽ More

    Submitted 18 May, 2021; v1 submitted 4 January, 2021; originally announced January 2021.

    Comments: 16 pages, 12 figures, 3 tables, 3 algorithms

  23. A Near-Optimal Parallel Algorithm for Joining Binary Relations

    Authors: Bas Ketsman, Dan Suciu, Yufei Tao

    Abstract: We present a constant-round algorithm in the massively parallel computation (MPC) model for evaluating a natural join where every input relation has two attributes. Our algorithm achieves a load of $\tilde{O}(m/p^{1/ρ})$ where $m$ is the total size of the input relations, $p$ is the number of machines, $ρ$ is the join's fractional edge covering number, and $\tilde{O}(.)$ hides a polylogarithmic fa… ▽ More

    Submitted 4 May, 2022; v1 submitted 29 November, 2020; originally announced November 2020.

    Journal ref: Logical Methods in Computer Science, Volume 18, Issue 2 (May 5, 2022) lmcs:6944

  24. arXiv:2009.08634  [pdf, ps, other

    cs.AI cs.CC cs.LG

    On the Tractability of SHAP Explanations

    Authors: Guy Van den Broeck, Anton Lykov, Maximilian Schleich, Dan Suciu

    Abstract: SHAP explanations are a popular feature-attribution mechanism for explainable AI. They use game-theoretic notions to measure the influence of individual features on the prediction of a machine learning model. Despite a lot of recent interest from both academia and industry, it is not known whether SHAP explanations of common machine learning models can be computed efficiently. In this paper, we es… ▽ More

    Submitted 30 January, 2021; v1 submitted 18 September, 2020; originally announced September 2020.

    Comments: Proceedings of the 35th AAAI Conference on Artificial Intelligence

  25. arXiv:2008.00896  [pdf, other

    cs.DB cs.CC

    A Dichotomy for the Generalized Model Counting Problem for Unions of Conjunctive Queries

    Authors: Batya Kenig, Dan Suciu

    Abstract: We study the $generalized~model~counting~problem$, defined as follows: given a database, and a set of deterministic tuples, count the number of subsets of the database that include all deterministic tuples and satisfy the query. This problem is computationally equivalent to the evaluation of the query over a tuple-independent probabilistic database where all tuples have probabilities in… ▽ More

    Submitted 20 May, 2021; v1 submitted 3 August, 2020; originally announced August 2020.

  26. arXiv:2004.08783  [pdf, other

    cs.IT cs.CC

    Decision Problems in Information Theory

    Authors: Mahmoud Abo Khamis, Phokion G. Kolaitis, Hung Q. Ngo, Dan Suciu

    Abstract: Constraints on entropies are considered to be the laws of information theory. Even though the pursuit of their discovery has been a central theme of research in information theory, the algorithmic aspects of constraints on entropies remain largely unexplored. Here, we initiate an investigation of decision problems about constraints on entropies by placing several different such problems into level… ▽ More

    Submitted 27 April, 2020; v1 submitted 19 April, 2020; originally announced April 2020.

  27. arXiv:2004.03644  [pdf, other

    cs.DB cs.AI cs.LG

    Causal Relational Learning

    Authors: Babak Salimi, Harsh Parikh, Moe Kayali, Sudeepa Roy, Lise Getoor, Dan Suciu

    Abstract: Causal inference is at the heart of empirical research in natural and social sciences and is critical for scientific discovery and informed decision making. The gold standard in causal inference is performing randomized controlled trials; unfortunately these are not always feasible due to ethical, legal, or cost constraints. As an alternative, methodologies for causal inference from observational… ▽ More

    Submitted 7 April, 2020; originally announced April 2020.

  28. arXiv:2003.06868  [pdf, other

    cs.LG cs.AI cs.DB stat.ML

    Causality-based Explanation of Classification Outcomes

    Authors: Leopoldo Bertossi, Jordan Li, Maximilian Schleich, Dan Suciu, Zografoula Vagena

    Abstract: We propose a simple definition of an explanation for the outcome of a classifier based on concepts from causality. We compare it with previously proposed notions of explanation, and study their complexity. We conduct an experimental evaluation with two real datasets from the financial domain.

    Submitted 25 May, 2020; v1 submitted 15 March, 2020; originally announced March 2020.

    Comments: 16 pages, 6 figures, 1 table

  29. arXiv:2002.09799  [pdf, other

    cs.DB

    Sample Debiasing in the Themis Open World Database System (Extended Version)

    Authors: Laurel Orr, Magda Balazinska, Dan Suciu

    Abstract: Open world database management systems assume tuples not in the database still exist and are becoming an increasingly important area of research. We present Themis, the first open world database that automatically rebalances arbitrarily biased samples to approximately answer queries as if they were issued over the entire population. We leverage apriori population aggregate information to develop a… ▽ More

    Submitted 29 February, 2020; v1 submitted 22 February, 2020; originally announced February 2020.

    Comments: SIGMOD 2020

  30. arXiv:2002.07951  [pdf, other

    cs.DB cs.PL

    SPORES: Sum-Product Optimization via Relational Equality Saturation for Large Scale Linear Algebra

    Authors: Yisu Remy Wang, Shana Hutchison, Jonathan Leang, Bill Howe, Dan Suciu

    Abstract: Machine learning algorithms are commonly specified in linear algebra (LA). LA expressions can be rewritten into more efficient forms, by taking advantage of input properties such as sparsity, as well as program properties such as common subexpressions and fusible operators. The complex interaction among these properties' impact on the execution cost poses a challenge to optimizing compilers. Exist… ▽ More

    Submitted 22 December, 2020; v1 submitted 18 February, 2020; originally announced February 2020.

  31. arXiv:1912.07777  [pdf, other

    cs.DB cs.LG

    Mosaic: A Sample-Based Database System for Open World Query Processing

    Authors: Laurel Orr, Samuel Ainsworth, Walter Cai, Kevin Jamieson, Magda Balazinska, Dan Suciu

    Abstract: Data scientists have relied on samples to analyze populations of interest for decades. Recently, with the increase in the number of public data repositories, sample data has become easier to access. It has not, however, become easier to analyze. This sample data is arbitrarily biased with an unknown sampling probability, meaning data scientists must manually debias the sample with custom technique… ▽ More

    Submitted 10 January, 2020; v1 submitted 16 December, 2019; originally announced December 2019.

    Comments: CIDR 2020

  32. arXiv:1911.12933  [pdf, other

    cs.DB

    Mining Approximate Acyclic Schemes from Relations

    Authors: Batya Kenig, Pranay Mundra, Guna Prasad, Babak Salimi, Dan Suciu

    Abstract: Acyclic schemes have numerous applications in databases and in machine learning, such as improved design, more efficient storage, and increased performance for queries and machine learning algorithms. Multivalued dependencies (MVDs) are the building blocks of acyclic schemes. The discovery from data of both MVDs and acyclic schemes is more challenging than other forms of data dependencies, such as… ▽ More

    Submitted 28 November, 2019; originally announced November 2019.

  33. arXiv:1911.04948  [pdf, other

    cs.DB

    EntropyDB: A Probabilistic Approach to Approximate Query Processing

    Authors: Laurel Orr, Magdalena Balazinska, Dan Suciu

    Abstract: We present EntropyDB, an interactive data exploration system that uses a probabilistic approach to generate a small, query-able summary of a dataset. Departing from traditional summarization techniques, we use the Principle of Maximum Entropy to generate a probabilistic representation of the data that can be used to give approximate query answers. We develop the theoretical framework and formulati… ▽ More

    Submitted 9 November, 2019; originally announced November 2019.

    Comments: arXiv admin note: text overlap with arXiv:1703.03856

    Journal ref: VLDB Journal 2019

  34. arXiv:1908.07924  [pdf, other

    cs.DB cs.LG

    Data Management for Causal Algorithmic Fairness

    Authors: Babak Salimi, Bill Howe, Dan Suciu

    Abstract: Fairness is increasingly recognized as a critical component of machine learning systems. However, it is the underlying data on which these systems are trained that often reflects discrimination, suggesting a data management problem. In this paper, we first make a distinction between associational and causal definitions of fairness in the literature and argue that the concept of fairness requires c… ▽ More

    Submitted 30 September, 2019; v1 submitted 20 August, 2019; originally announced August 2019.

    Comments: arXiv admin note: text overlap with arXiv:1902.08283

  35. arXiv:1906.09727  [pdf, ps, other

    cs.DB cs.IT

    Bag Query Containment and Information Theory

    Authors: Mahmoud Abo Khamis, Phokion G. Kolaitis, Hung Q. Ngo, Dan Suciu

    Abstract: The query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question whether or not the conjunctive query containment problem under bag semantics is decidable. We unveil tight connections between information theory and the… ▽ More

    Submitted 5 July, 2021; v1 submitted 24 June, 2019; originally announced June 2019.

  36. arXiv:1902.08283  [pdf, other

    cs.DB cs.AI

    Capuchin: Causal Database Repair for Algorithmic Fairness

    Authors: Babak Salimi, Luke Rodriguez, Bill Howe, Dan Suciu

    Abstract: Fairness is increasingly recognized as a critical component of machine learning systems. However, it is the underlying data on which these systems are trained that often reflect discrimination, suggesting a database repair problem. Existing treatments of fairness rely on statistical correlations that can be fooled by statistical anomalies, such as Simpson's paradox. Proposals for causality-based d… ▽ More

    Submitted 1 October, 2019; v1 submitted 21 February, 2019; originally announced February 2019.

    Journal ref: Proceedings of the 2019 International Conference on Management of Data. ACM, 2019

  37. Integrity Constraints Revisited: From Exact to Approximate Implication

    Authors: Batya Kenig, Dan Suciu

    Abstract: Integrity constraints such as functional dependencies (FD) and multi-valued dependencies (MVD) are fundamental in database schema design. Likewise, probabilistic conditional independences (CI) are crucial for reasoning about multivariate probability distributions. The implication problem studies whether a set of constraints (antecedents) implies another constraint (consequent), and has been invest… ▽ More

    Submitted 10 January, 2022; v1 submitted 24 December, 2018; originally announced December 2018.

    Journal ref: Logical Methods in Computer Science, Volume 18, Issue 1 (January 11, 2022) lmcs:6925

  38. arXiv:1810.01997  [pdf, other

    cs.DB

    Improving High Contention OLTP Performance via Transaction Scheduling

    Authors: Guna Prasaad, Alvin Cheung, Dan Suciu

    Abstract: Research in transaction processing has made significant progress in improving the performance of multi-core in-memory transactional systems. However, the focus has mainly been on low-contention workloads. Modern transactional systems perform poorly on workloads with transactions accessing a few highly contended data items. We observe that most transactional workloads, including those with high con… ▽ More

    Submitted 3 October, 2018; originally announced October 2018.

  39. arXiv:1804.00443  [pdf, ps, other

    cs.DB

    A Note on the Hardness of the Critical Tuple Problem

    Authors: Egor V. Kostylev, Dan Suciu

    Abstract: The notion of critical tuple was introduced by Miklau and Suciu (Gerome Miklau and Dan Suciu. A formal analysis of information disclosure in data exchange. J. Comput. Syst. Sci., 73(3):507-534, 2007), who also claimed that the problem of checking whether a tuple is non-critical is complete for the second level of the polynomial hierarchy. Kostylev identified an error in the 12-page-long hardness p… ▽ More

    Submitted 2 April, 2018; originally announced April 2018.

  40. arXiv:1803.04562  [pdf, other

    cs.DB

    Bias in OLAP Queries: Detection, Explanation, and Removal

    Authors: Babak Salimi, Johannes Gehrke, Dan Suciu

    Abstract: On line analytical processing (OLAP) is an essential element of decision-support systems. OLAP tools provide insights and understanding needed for improved decision making. However, the answers to OLAP queries can be biased and lead to perplexing and incorrect insights. In this paper, we propose HypDB, a system to detect, explain, and to resolve bias in decision-support queries. We give a simple d… ▽ More

    Submitted 24 July, 2018; v1 submitted 12 March, 2018; originally announced March 2018.

    Comments: This paper is an extended version of a paper presented at SIGMOD 2018

  41. arXiv:1802.02229  [pdf, other

    cs.DB cs.PL

    Axiomatic Foundations and Algorithms for Deciding Semantic Equivalences of SQL Queries

    Authors: Shumo Chu, Brendan Murphy, Jared Roesch, Alvin Cheung, Dan Suciu

    Abstract: Deciding the equivalence of SQL queries is a fundamental problem in data management. As prior work has mainly focused on studying the theoretical limitations of the problem, very few implementations for checking such equivalences exist. In this paper, we present a new formalism and implementation for reasoning about the equivalences of SQL queries. Our formalism, U-semiring, extends SQL's semiring… ▽ More

    Submitted 23 May, 2018; v1 submitted 6 February, 2018; originally announced February 2018.

  42. arXiv:1712.07445  [pdf, ps, other

    cs.DB cs.DM math.CO

    Boolean Tensor Decomposition for Conjunctive Queries with Negation

    Authors: Mahmoud Abo Khamis, Hung Q. Ngo, Dan Olteanu, Dan Suciu

    Abstract: We propose an algorithm for answering conjunctive queries with negation, where the negated relations have bounded degree. Its data complexity matches that of the best known algorithms for the positive subquery of the input query and is expressed in terms of the fractional hypertree width and the submodular width. The query complexity depends on the structure of the negated subquery; in general it… ▽ More

    Submitted 27 January, 2019; v1 submitted 20 December, 2017; originally announced December 2017.

  43. LaraDB: A Minimalist Kernel for Linear and Relational Algebra Computation

    Authors: Dylan Hutchison, Bill Howe, Dan Suciu

    Abstract: Analytics tasks manipulate structured data with variants of relational algebra (RA) and quantitative data with variants of linear algebra (LA). The two computational models have overlapping expressiveness, motivating a common programming model that affords unified reasoning and algorithm design. At the logical level we propose Lara, a lean algebra of three operators, that expresses RA and LA as we… ▽ More

    Submitted 13 April, 2017; v1 submitted 21 March, 2017; originally announced March 2017.

    Comments: 10 pages, to appear in the BeyondMR workshop at the 2017 ACM SIGMOD conference

  44. arXiv:1703.03856  [pdf, other

    cs.DB

    Probabilistic Database Summarization for Interactive Data Exploration

    Authors: Laurel Orr, Magda Balazinska, Dan Suciu

    Abstract: We present a probabilistic approach to generate a small, query-able summary of a dataset for interactive data exploration. Departing from traditional summarization techniques, we use the Principle of Maximum Entropy to generate a probabilistic representation of the data that can be used to give approximate query answers. We develop the theoretical framework and formulation of our probabilistic rep… ▽ More

    Submitted 23 May, 2017; v1 submitted 10 March, 2017; originally announced March 2017.

    Comments: To appear VLDB 2017

  45. arXiv:1701.09007  [pdf, other

    cs.DB

    Research Directions for Principles of Data Management (Dagstuhl Perspectives Workshop 16151)

    Authors: Serge Abiteboul, Marcelo Arenas, Pablo Barceló, Meghyn Bienvenu, Diego Calvanese, Claire David, Richard Hull, Eyke Hüllermeier, Benny Kimelfeld, Leonid Libkin, Wim Martens, Tova Milo, Filip Murlak, Frank Neven, Magdalena Ortiz, Thomas Schwentick, Julia Stoyanovich, Jianwen Su, Dan Suciu, Victor Vianu, Ke Yi

    Abstract: In April 2016, a community of researchers working in the area of Principles of Data Management (PDM) joined in a workshop at the Dagstuhl Castle in Germany. The workshop was organized jointly by the Executive Committee of the ACM Symposium on Principles of Database Systems (PODS) and the Council of the International Conference on Database Theory (ICDT). The mission of this workshop was to identify… ▽ More

    Submitted 31 January, 2017; originally announced January 2017.

  46. arXiv:1612.02503  [pdf, ps, other

    cs.DB cs.DS cs.IT

    What do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog have to do with one another?

    Authors: Mahmoud Abo Khamis, Hung Q. Ngo, Dan Suciu

    Abstract: Recent works on bounding the output size of a conjunctive query with functional dependencies and degree constraints have shown a deep connection between fundamental questions in information theory and database theory. We prove analogous output bounds for disjunctive datalog rules, and answer several open questions regarding the tightness and looseness of these bounds along the way. Our bounds are… ▽ More

    Submitted 23 December, 2023; v1 submitted 7 December, 2016; originally announced December 2016.

  47. arXiv:1609.03540  [pdf, ps, other

    cs.DB cs.AI cs.LG cs.PF

    ZaliQL: A SQL-Based Framework for Drawing Causal Inference from Big Data

    Authors: Babak Salimi, Dan Suciu

    Abstract: Causal inference from observational data is a subject of active research and development in statistics and computer science. Many toolkits have been developed for this purpose that depends on statistical software. However, these toolkits do not scale to large datasets. In this paper we describe a suite of techniques for expressing causal inference tasks from observational data in SQL. This suite s… ▽ More

    Submitted 12 September, 2016; v1 submitted 12 September, 2016; originally announced September 2016.

  48. arXiv:1607.04822  [pdf, other

    cs.PL cs.DB cs.LO

    HoTTSQL: Proving Query Rewrites with Univalent SQL Semantics

    Authors: Shumo Chu, Konstantin Weitz, Alvin Cheung, Dan Suciu

    Abstract: Every database system contains a query optimizer that performs query rewrites. Unfortunately, developing query optimizers remains a highly challenging task. Part of the challenges comes from the intricacies and rich features of query languages, which makes reasoning about rewrite rules difficult. In this paper, we propose a machine-checkable denotational semantics for SQL, the de facto language fo… ▽ More

    Submitted 5 August, 2016; v1 submitted 16 July, 2016; originally announced July 2016.

  49. arXiv:1604.03607  [pdf, other

    cs.DB cs.PL

    Lara: A Key-Value Algebra underlying Arrays and Relations

    Authors: Dylan Hutchison, Bill Howe, Dan Suciu

    Abstract: Data processing systems roughly group into families such as relational, array, graph, and key-value. Many data processing tasks exceed the capabilities of any one family, require data stored across families, or run faster when partitioned onto multiple families. Discovering ways to execute computation among multiple available systems, let alone discovering an optimal execution plan, is challenging… ▽ More

    Submitted 12 April, 2016; originally announced April 2016.

    Comments: Working draft

  50. arXiv:1604.01848  [pdf, other

    cs.DB

    Worst-Case Optimal Algorithms for Parallel Query Processing

    Authors: Paul Beame, Paraschos Koutris, Dan Suciu

    Abstract: In this paper, we study the communication complexity for the problem of computing a conjunctive query on a large database in a parallel setting with $p$ servers. In contrast to previous work, where upper and lower bounds on the communication were specified for particular structures of data (either data without skew, or data with specific types of skew), in this work we focus on worst-case analysis… ▽ More

    Submitted 6 April, 2016; originally announced April 2016.