-
IFTT-PIN: A Self-Calibrating PIN-Entry Method
Authors:
Kathryn McConkey,
Talha Enes Ayranci,
Mohamed Khamis,
Jonathan Grizou
Abstract:
Personalising an interface to the needs and preferences of a user often incurs additional interaction steps. In this paper, we demonstrate a novel method that enables the personalising of an interface without the need for explicit calibration procedures, via a process we call self-calibration. A second-order effect of self-calibration is that an outside observer cannot easily infer what a user is…
▽ More
Personalising an interface to the needs and preferences of a user often incurs additional interaction steps. In this paper, we demonstrate a novel method that enables the personalising of an interface without the need for explicit calibration procedures, via a process we call self-calibration. A second-order effect of self-calibration is that an outside observer cannot easily infer what a user is trying to achieve because they cannot interpret the user's actions. To explore this security angle, we developed IFTT-PIN (If This Then PIN) as the first self-calibrating PIN-entry method. When using IFTT-PIN, users are free to choose any button for any meaning without ever explicitly communicating their choice to the machine. IFTT-PIN infers both the user's PIN and their preferred button mapping at the same time. This paper presents the concept, implementation, and interactive demonstrations of IFTT-PIN, as well as an evaluation against shoulder surfing attacks. Our study (N=24) shows that by adding self-calibration to an existing PIN entry method, IFTT-PIN statistically significantly decreased PIN attack decoding rate by ca. 8.5 times (p=1.1e-9), while only decreasing the PIN entry encoding rate by ca. 1.4 times (p=0.02), leading to a positive security-usability trade-off. IFTT-PIN's entry rate significantly improved 21 days after first exposure (p=3.6e-6) to the method, suggesting self-calibrating interfaces are memorable despite using an initially undefined user interface. Self-calibration methods might lead to novel opportunities for interaction that are more inclusive and versatile, a potentially interesting challenge for the community. A short introductory video is available at https://youtu.be/pP5sfniNRns.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
PANDA: Query Evaluation in Submodular Width
Authors:
Mahmoud Abo Khamis,
Hung Q. Ngo,
Dan Suciu
Abstract:
In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations and the query plans that they support. This motivated the search for algorithms that can compute the output of a join query in times that are bounded by the corr…
▽ More
In recent years, several information-theoretic upper bounds have been introduced on the output size and evaluation cost of database join queries. These bounds vary in their power depending on both the type of statistics on input relations and the query plans that they support. This motivated the search for algorithms that can compute the output of a join query in times that are bounded by the corresponding information-theoretic bounds. In this paper, we describe PANDA, an algorithm that takes a Shannon-inequality that underlies the bound, and translates each proof step into an algorithmic step corresponding to some database operation. PANDA computes answers to a conjunctive query in time given by the the submodular width plus the output size of the query. The version in this paper represents a significant simplification of the original version [ANS, PODS'17].
△ Less
Submitted 13 September, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
Insert-Only versus Insert-Delete in Dynamic Query Evaluation
Authors:
Mahmoud Abo Khamis,
Ahmet Kara,
Dan Olteanu,
Dan Suciu
Abstract:
We study the dynamic query evaluation problem: Given a full conjunctive query Q and a sequence of updates to the input database, we construct a data structure that supports constant-delay enumeration of the tuples in the query output after each update.
We show that a sequence of N insert-only updates to an initially empty database can be executed in total time O(N^w(Q)), where w(Q) is the fracti…
▽ More
We study the dynamic query evaluation problem: Given a full conjunctive query Q and a sequence of updates to the input database, we construct a data structure that supports constant-delay enumeration of the tuples in the query output after each update.
We show that a sequence of N insert-only updates to an initially empty database can be executed in total time O(N^w(Q)), where w(Q) is the fractional hypertree width of Q. This matches the complexity of the static query evaluation problem for Q and a database of size N. One corollary is that the amortized time per single-tuple insert is constant for acyclic full conjunctive queries.
In contrast, we show that a sequence of N inserts and deletes can be executed in total time O(N^w(Q')), where Q' is obtained from Q by extending every relational atom with extra variables that represent the "lifespans" of tuples in the database. We show that this reduction is optimal in the sense that the static evaluation runtime of Q' provides a lower bound on the total update time for the output of Q. Our approach achieves amortized optimal update times for the hierarchical and Loomis-Whitney join queries.
△ Less
Submitted 13 September, 2024; v1 submitted 14 December, 2023;
originally announced December 2023.
-
An End-to-End Review of Gaze Estimation and its Interactive Applications on Handheld Mobile Devices
Authors:
Yaxiong Lei,
Shijing He,
Mohamed Khamis,
Juan Ye
Abstract:
In recent years we have witnessed an increasing number of interactive systems on handheld mobile devices which utilise gaze as a single or complementary interaction modality. This trend is driven by the enhanced computational power of these devices, higher resolution and capacity of their cameras, and improved gaze estimation accuracy obtained from advanced machine learning techniques, especially…
▽ More
In recent years we have witnessed an increasing number of interactive systems on handheld mobile devices which utilise gaze as a single or complementary interaction modality. This trend is driven by the enhanced computational power of these devices, higher resolution and capacity of their cameras, and improved gaze estimation accuracy obtained from advanced machine learning techniques, especially in deep learning. As the literature is fast progressing, there is a pressing need to review the state of the art, delineate the boundary, and identify the key research challenges and opportunities in gaze estimation and interaction. This paper aims to serve this purpose by presenting an end-to-end holistic view in this area, from gaze capturing sensors, to gaze estimation workflows, to deep learning techniques, and to gaze interactive applications.
△ Less
Submitted 30 June, 2023;
originally announced July 2023.
-
Join Size Bounds using Lp-Norms on Degree Sequences
Authors:
Mahmoud Abo Khamis,
Vasileios Nakos,
Dan Olteanu,
Dan Suciu
Abstract:
Estimating the output size of a query is a fundamental yet longstanding problem in database query processing. Traditional cardinality estimators used by database systems can routinely underestimate the true output size by orders of magnitude, which leads to significant system performance penalty. Recently, upper bounds have been proposed that are based on information inequalities and incorporate s…
▽ More
Estimating the output size of a query is a fundamental yet longstanding problem in database query processing. Traditional cardinality estimators used by database systems can routinely underestimate the true output size by orders of magnitude, which leads to significant system performance penalty. Recently, upper bounds have been proposed that are based on information inequalities and incorporate sizes and max-degrees from input relations, yet they their main benefit is limited to cyclic queries, because they degenerate to rather trivial formulas on acyclic queries.
We introduce a significant extension of the upper bounds, by incorporating $\ell_p$-norms of the degree sequences of join attributes. Our bounds are significantly lower than previously known bounds, even when applied to acyclic queries. These bounds are also based on information theory, they come with a matching query evaluation algorithm, are computable in exponential time in the query size, and are provably tight when all degrees are "simple".
△ Less
Submitted 5 June, 2024; v1 submitted 24 June, 2023;
originally announced June 2023.
-
DynamicRead: Exploring Robust Gaze Interaction Methods for Reading on Handheld Mobile Devices under Dynamic Conditions
Authors:
Yaxiong Lei,
Yuheng Wang,
Tyler Caslin,
Alexander Wisowaty,
Xu Zhu,
Mohamed Khamis,
Juan Ye
Abstract:
Enabling gaze interaction in real-time on handheld mobile devices has attracted significant attention in recent years. An increasing number of research projects have focused on sophisticated appearance-based deep learning models to enhance the precision of gaze estimation on smartphones. This inspires important research questions, including how the gaze can be used in a real-time application, and…
▽ More
Enabling gaze interaction in real-time on handheld mobile devices has attracted significant attention in recent years. An increasing number of research projects have focused on sophisticated appearance-based deep learning models to enhance the precision of gaze estimation on smartphones. This inspires important research questions, including how the gaze can be used in a real-time application, and what type of gaze interaction methods are preferable under dynamic conditions in terms of both user acceptance and delivering reliable performance. To address these questions, we design four types of gaze scrolling techniques: three explicit technique based on Gaze Gesture, Dwell time, and Pursuit; and one implicit technique based on reading speed to support touch-free, page-scrolling on a reading application. We conduct a 20-participant user study under both sitting and walking settings and our results reveal that Gaze Gesture and Dwell time-based interfaces are more robust while walking and Gaze Gesture has achieved consistently good scores on usability while not causing high cognitive workload.
△ Less
Submitted 19 April, 2023;
originally announced April 2023.
-
Understanding Dynamic Human-Robot Proxemics in the Case of Four-Legged Canine-Inspired Robots
Authors:
Xiangmin Xu,
Emma Liying Li,
Mohamed Khamis,
Guodong Zhao,
Robin Bretin
Abstract:
Recently, quadruped robots have been well developed with potential applications in different areas, such as care homes, hospitals, and other social areas. To ensure their integration in such social contexts, it is essential to understand people's proxemic preferences around such robots. In this paper, we designed a human-quadruped-robot interaction study (N = 32) to investigate the effect of 1) di…
▽ More
Recently, quadruped robots have been well developed with potential applications in different areas, such as care homes, hospitals, and other social areas. To ensure their integration in such social contexts, it is essential to understand people's proxemic preferences around such robots. In this paper, we designed a human-quadruped-robot interaction study (N = 32) to investigate the effect of 1) different facing orientations and 2) the gaze of a moving robot on human proxemic distance. Our work covers both static and dynamic interaction scenarios. We found a statistically significant effect of both the robot's facing direction and its gaze on preferred personal distances. The distances humans established towards certain robot behavioral modes reflect their attitudes, thereby guiding the design of future autonomous robots.
△ Less
Submitted 16 September, 2024; v1 submitted 21 February, 2023;
originally announced February 2023.
-
The Dark Side of Perceptual Manipulations in Virtual Reality
Authors:
Wen-Jie Tseng,
Elise Bonnail,
Mark McGill,
Mohamed Khamis,
Eric Lecolinet,
Samuel Huron,
Jan Gugenheimer
Abstract:
"Virtual-Physical Perceptual Manipulations" (VPPMs) such as redirected walking and haptics expand the user's capacity to interact with Virtual Reality (VR) beyond what would ordinarily physically be possible. VPPMs leverage knowledge of the limits of human perception to effect changes in the user's physical movements, becoming able to (perceptibly and imperceptibly) nudge their physical actions to…
▽ More
"Virtual-Physical Perceptual Manipulations" (VPPMs) such as redirected walking and haptics expand the user's capacity to interact with Virtual Reality (VR) beyond what would ordinarily physically be possible. VPPMs leverage knowledge of the limits of human perception to effect changes in the user's physical movements, becoming able to (perceptibly and imperceptibly) nudge their physical actions to enhance interactivity in VR. We explore the risks posed by the malicious use of VPPMs. First, we define, conceptualize and demonstrate the existence of VPPMs. Next, using speculative design workshops, we explore and characterize the threats/risks posed, proposing mitigations and preventative recommendations against the malicious use of VPPMs. Finally, we implement two sample applications to demonstrate how existing VPPMs could be trivially subverted to create the potential for physical harm. This paper aims to raise awareness that the current way we apply and publish VPPMs can lead to malicious exploits of our perceptual vulnerabilities.
△ Less
Submitted 26 February, 2022;
originally announced February 2022.
-
Optimizing Recursive Queries with Program Synthesis
Authors:
Yisu Remy Wang,
Mahmoud Abo Khamis,
Hung Q. Ngo,
Reinhard Pichler,
Dan Suciu
Abstract:
Most work on query optimization has concentrated on loop-free queries. However, data science and machine learning workloads today typically involve recursive or iterative computation. In this work, we propose a novel framework for optimizing recursive queries using methods from program synthesis. In particular, we introduce a simple yet powerful optimization rule called the "FGH-rule" which aims t…
▽ More
Most work on query optimization has concentrated on loop-free queries. However, data science and machine learning workloads today typically involve recursive or iterative computation. In this work, we propose a novel framework for optimizing recursive queries using methods from program synthesis. In particular, we introduce a simple yet powerful optimization rule called the "FGH-rule" which aims to find a faster way to evaluate a recursive program. The solution is found by making use of powerful tools, such as a program synthesizer, an SMT-solver, and an equality saturation system. We demonstrate the strength of the optimization by showing that the FGH-rule can lead to speedups up to 4 orders of magnitude on three, already optimized Datalog systems.
△ Less
Submitted 21 February, 2022;
originally announced February 2022.
-
State-of-the-Art in Smart Contact Lenses for Human Machine Interaction
Authors:
Yuanjie Xia,
Mohamed Khamis,
F. Anibal Fernandez,
Hadi Heidari,
Haider Butt,
Zubair Ahmed,
Tim Wilkinson,
Rami Ghannam
Abstract:
Contact lenses have traditionally been used for vision correction applications. Recent advances in microelectronics and nanofabrication on flexible substrates have now enabled sensors, circuits and other essential components to be integrated on a small contact lens platform. This has opened up the possibility of using contact lenses for a range of human-machine interaction applications including v…
▽ More
Contact lenses have traditionally been used for vision correction applications. Recent advances in microelectronics and nanofabrication on flexible substrates have now enabled sensors, circuits and other essential components to be integrated on a small contact lens platform. This has opened up the possibility of using contact lenses for a range of human-machine interaction applications including vision assistance, eye tracking, displays and health care. In this article, we systematically review the range of smart contact lens materials, device architectures and components that facilitate this interaction for different applications. In fact, evidence from our systematic review demonstrates that these lenses can be used to display information, detect eye movements, restore vision and detect certain biomarkers in tear fluid. Consequently, whereas previous state-of the-art reviews in contact lenses focused exclusively on biosensing, our systematic review covers a wider range of smart contact lens applications in HMI. Moreover, we present a new method of classifying the literature on smart contact lenses according to their six constituent building blocks, which are the sensing, energy management, driver electronics, communications, substrate and the interfacing modules. Based on recent developments in each of these categories, we speculate the challenges and opportunities of smart contact lenses for human-machine interaction. Moreover, we propose a novel self-powered smart contact lens concept with integrated energy harvesters, sensors and communication modules to enable autonomous operation. Our review is therefore a critical evaluation of current data and is presented with the aim of guiding researchers to new research directions in smart contact lenses.
△ Less
Submitted 5 April, 2022; v1 submitted 20 December, 2021;
originally announced December 2021.
-
Markov Switching Model for Driver Behavior Prediction: Use cases on Smartphones
Authors:
Ahmed B. Zaky,
Mohamed A. Khamis,
Walid Gomaa
Abstract:
Several intelligent transportation systems focus on studying the various driver behaviors for numerous objectives. This includes the ability to analyze driver actions, sensitivity, distraction, and response time. As the data collection is one of the major concerns for learning and validating different driving situations, we present a driver behavior switching model validated by a low-cost data col…
▽ More
Several intelligent transportation systems focus on studying the various driver behaviors for numerous objectives. This includes the ability to analyze driver actions, sensitivity, distraction, and response time. As the data collection is one of the major concerns for learning and validating different driving situations, we present a driver behavior switching model validated by a low-cost data collection solution using smartphones. The proposed model is validated using a real dataset to predict the driver behavior in short duration periods. A literature survey on motion detection (specifically driving behavior detection using smartphones) is presented. Multiple Markov Switching Variable Auto-Regression (MSVAR) models are implemented to achieve a sophisticated fitting with the collected driver behavior data. This yields more accurate predictions not only for driver behavior but also for the entire driving situation. The performance of the presented models together with a suitable model selection criteria is also presented. The proposed driver behavior prediction framework can potentially be used in accident prediction and driver safety systems.
△ Less
Submitted 29 August, 2021;
originally announced August 2021.
-
Can corrections to gravity at galactic distances be decisive to the problem of dark matter and dark energy?
Authors:
Timur F. Kamalov,
Olga A. Volkova,
Hassan M. H. Khamis
Abstract:
Are Dark Matter and Dark Energy the result of uncalculated addition derivatives? The need to introduce dark matter dark and energy becomes unnecessary if we consider that, the phenomenon of dark matter and dark energy is a result of not computing the additional derivatives of the equation of motion. For this purpose, we use higher derivatives in the form of non-local variables, known as the Ostrog…
▽ More
Are Dark Matter and Dark Energy the result of uncalculated addition derivatives? The need to introduce dark matter dark and energy becomes unnecessary if we consider that, the phenomenon of dark matter and dark energy is a result of not computing the additional derivatives of the equation of motion. For this purpose, we use higher derivatives in the form of non-local variables, known as the Ostrogradsky formalism. As a mathematician, Ostrogradsky considered the dependence of the Lagrange function on acceleration and its higher derivatives with respect to time. This is the case that fully correspond with the real frame of reference, and that can be both inertial and non-inertial frames. The problem of dark matter and dark energy presented starting from basic observations to explain the different results in theory and experiment. The study of galactic motion, especially the rotation curves, showed that a large amount of dark matter can be found mainly in galactic halos. The search for dark matter and dark energy has not confirmed with the experimental discovery of it, so we use Ostrogradsky formalities to explain the effects described above, so that the need to introduce dark matter and dark energy disappears.
△ Less
Submitted 14 July, 2021;
originally announced August 2021.
-
The Complexity of Boolean Conjunctive Queries with Intersection Joins
Authors:
Mahmoud Abo Khamis,
George Chichirim,
Antonia Kormpa,
Dan Olteanu
Abstract:
Intersection joins over interval data are relevant in spatial and temporal data settings. A set of intervals join if their intersection is non-empty. In case of point intervals, the intersection join becomes the standard equality join.
We establish the complexity of Boolean conjunctive queries with intersection joins by a many-one equivalence to disjunctions of Boolean conjunctive queries with e…
▽ More
Intersection joins over interval data are relevant in spatial and temporal data settings. A set of intervals join if their intersection is non-empty. In case of point intervals, the intersection join becomes the standard equality join.
We establish the complexity of Boolean conjunctive queries with intersection joins by a many-one equivalence to disjunctions of Boolean conjunctive queries with equality joins. The complexity of any query with intersection joins is that of the hardest query with equality joins in the disjunction exhibited by our equivalence. This is captured by a new width measure called the IJ-width.
We also introduce a new syntactic notion of acyclicity called iota-acyclicity to characterise the class of Boolean queries with intersection joins that admit linear time computation modulo a poly-logarithmic factor in the data size. Iota-acyclicity is for intersection joins what alpha-acyclicity is for equality joins. It strictly sits between gamma-acyclicity and Berge-acyclicity. The intersection join queries that are not iota-acyclic are at least as hard as the Boolean triangle query with equality joins, which is widely considered not computable in linear time.
△ Less
Submitted 14 April, 2022; v1 submitted 24 June, 2021;
originally announced June 2021.
-
Convergence of Datalog over (Pre-) Semirings
Authors:
Mahmoud Abo Khamis,
Hung Q. Ngo,
Reinhard Pichler,
Dan Suciu,
Yisu Remy Wang
Abstract:
Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this paper we study the convergence of datalog when it is interpreted over an arbitrary s…
▽ More
Recursive queries have been traditionally studied in the framework of datalog, a language that restricts recursion to monotone queries over sets, which is guaranteed to converge in polynomial time in the size of the input. But modern big data systems require recursive computations beyond the Boolean space. In this paper we study the convergence of datalog when it is interpreted over an arbitrary semiring. We consider an ordered semiring, define the semantics of a datalog program as a least fixpoint in this semiring, and study the number of steps required to reach that fixpoint, if ever. We identify algebraic properties of the semiring that correspond to certain convergence properties of datalog programs. Finally, we describe a class of ordered semirings on which one can use the semi-naïve evaluation algorithm on any datalog program.
△ Less
Submitted 24 January, 2024; v1 submitted 30 May, 2021;
originally announced May 2021.
-
Decision Problems in Information Theory
Authors:
Mahmoud Abo Khamis,
Phokion G. Kolaitis,
Hung Q. Ngo,
Dan Suciu
Abstract:
Constraints on entropies are considered to be the laws of information theory. Even though the pursuit of their discovery has been a central theme of research in information theory, the algorithmic aspects of constraints on entropies remain largely unexplored. Here, we initiate an investigation of decision problems about constraints on entropies by placing several different such problems into level…
▽ More
Constraints on entropies are considered to be the laws of information theory. Even though the pursuit of their discovery has been a central theme of research in information theory, the algorithmic aspects of constraints on entropies remain largely unexplored. Here, we initiate an investigation of decision problems about constraints on entropies by placing several different such problems into levels of the arithmetical hierarchy. We establish the following results on checking the validity over all almost-entropic functions: first, validity of a Boolean information constraint arising from a monotone Boolean formula is co-recursively enumerable; second, validity of "tight" conditional information constraints is in $Π^0_3$. Furthermore, under some restrictions, validity of conditional information constraints "with slack" is in $Σ^0_2$, and validity of information inequality constraints involving max is Turing equivalent to validity of information inequality constraints (with no max involved). We also prove that the classical implication problem for conditional independence statements is co-recursively enumerable.
△ Less
Submitted 27 April, 2020; v1 submitted 19 April, 2020;
originally announced April 2020.
-
"Please enter your PIN" -- On the Risk of Bypass Attacks on Biometric Authentication on Mobile Devices
Authors:
Christian Tiefenau,
Maximilian Häring,
Mohamed Khamis,
Emanuel von Zezschwitz
Abstract:
Nowadays, most mobile devices support biometric authentication schemes like fingerprint or face unlock. However, these probabilistic mechanisms can only be activated in combination with a second alternative factor, usually knowledge-based authentication. In this paper, we show that this aspect can be exploited in a bypass attack. In this bypass attack, the attacker forces the user to "bypass" the…
▽ More
Nowadays, most mobile devices support biometric authentication schemes like fingerprint or face unlock. However, these probabilistic mechanisms can only be activated in combination with a second alternative factor, usually knowledge-based authentication. In this paper, we show that this aspect can be exploited in a bypass attack. In this bypass attack, the attacker forces the user to "bypass" the biometric authentication by, for example, resetting the phone. This forces the user to enter an easy-to-observe passcode instead. We present the threat model and provide preliminary results of an online survey. Based on our results, we discuss potential countermeasures. We conclude that better feedback design and security-optimized fallback mechanisms can help further improve the overall security of mobile unlock mechanisms while preserving usability.
△ Less
Submitted 18 November, 2019;
originally announced November 2019.
-
Bag Query Containment and Information Theory
Authors:
Mahmoud Abo Khamis,
Phokion G. Kolaitis,
Hung Q. Ngo,
Dan Suciu
Abstract:
The query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question whether or not the conjunctive query containment problem under bag semantics is decidable. We unveil tight connections between information theory and the…
▽ More
The query containment problem is a fundamental algorithmic problem in data management. While this problem is well understood under set semantics, it is by far less understood under bag semantics. In particular, it is a long-standing open question whether or not the conjunctive query containment problem under bag semantics is decidable. We unveil tight connections between information theory and the conjunctive query containment under bag semantics. These connections are established using information inequalities, which are considered to be the laws of information theory. Our first main result asserts that deciding the validity of maxima of information inequalities is many-one equivalent to the restricted case of conjunctive query containment in which the containing query is acyclic; thus, either both these problems are decidable or both are undecidable. Our second main result identifies a new decidable case of the conjunctive query containment problem under bag semantics. Specifically, we give an exponential time algorithm for conjunctive query containment under bag semantics, provided the containing query is chordal and admits a simple junction tree.
△ Less
Submitted 5 July, 2021; v1 submitted 24 June, 2019;
originally announced June 2019.
-
A Layered Aggregate Engine for Analytics Workloads
Authors:
Maximilian Schleich,
Dan Olteanu,
Mahmoud Abo Khamis,
Hung Q. Ngo,
XuanLong Nguyen
Abstract:
This paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optimization and execution engine for batches of aggregates over the input database. The primary motivation for this work stems from the observation that for a variety of analytics over databases, their data-intensive tasks can be decomposed into group-by aggregates over the join of the input database re…
▽ More
This paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optimization and execution engine for batches of aggregates over the input database. The primary motivation for this work stems from the observation that for a variety of analytics over databases, their data-intensive tasks can be decomposed into group-by aggregates over the join of the input database relations. We exemplify the versatility and competitiveness of LMFAO for a handful of widely used analytics: learning ridge linear regression, classification trees, regression trees, and the structure of Bayesian networks using Chow-Liu trees; and data cubes used for exploration in data warehousing.
LMFAO consists of several layers of logical and code optimizations that systematically exploit sharing of computation, parallelism, and code specialization.
We conducted two types of performance benchmarks. In experiments with four datasets, LMFAO outperforms by several orders of magnitude on one hand, a commercial database system and MonetDB for computing batches of aggregates, and on the other hand, TensorFlow, Scikit, R, and AC/DC for learning a variety of models over databases.
△ Less
Submitted 20 June, 2019;
originally announced June 2019.
-
Functional Aggregate Queries with Additive Inequalities
Authors:
Mahmoud Abo Khamis,
Ryan R. Curtin,
Benjamin Moseley,
Hung Q. Ngo,
XuanLong Nguyen,
Dan Olteanu,
Maximilian Schleich
Abstract:
Motivated by fundamental applications in databases and relational machine learning, we formulate and study the problem of answering functional aggregate queries (FAQ) in which some of the input factors are defined by a collection of additive inequalities between variables. We refer to these queries as FAQ-AI for short.
To answer FAQ-AI in the Boolean semiring, we define relaxed tree decompositio…
▽ More
Motivated by fundamental applications in databases and relational machine learning, we formulate and study the problem of answering functional aggregate queries (FAQ) in which some of the input factors are defined by a collection of additive inequalities between variables. We refer to these queries as FAQ-AI for short.
To answer FAQ-AI in the Boolean semiring, we define relaxed tree decompositions and relaxed submodular and fractional hypertree width parameters. We show that an extension of the InsideOut algorithm using Chazelle's geometric data structure for solving the semigroup range search problem can answer Boolean FAQ-AI in time given by these new width parameters. This new algorithm achieves lower complexity than known solutions for FAQ-AI. It also recovers some known results in database query answering.
Our second contribution is a relaxation of the set of polymatroids that gives rise to the counting version of the submodular width, denoted by #subw. This new width is sandwiched between the submodular and the fractional hypertree widths. Any FAQ and FAQ-AI over one semiring can be answered in time proportional to #subw and respectively to the relaxed version of #subw.
We present three applications of our FAQ-AI framework to relational machine learning: k-means clustering, training linear support vector machines, and training models using non-polynomial loss. These optimization problems can be solved over a database asymptotically faster than computing the join of the database relations.
△ Less
Submitted 15 September, 2020; v1 submitted 22 December, 2018;
originally announced December 2018.
-
MARL-FWC: Optimal Coordination of Freeway Traffic Control Measures
Authors:
Ahmed Fares,
Walid Gomaa,
Mohamed A. Khamis
Abstract:
The objective of this article is to optimize the overall traffic flow on freeways using multiple ramp metering controls plus its complementary Dynamic Speed Limits (DSLs). An optimal freeway operation can be reached when minimizing the difference between the freeway density and the critical ratio for maximum traffic flow. In this article, a Multi-Agent Reinforcement Learning for Freeways Control (…
▽ More
The objective of this article is to optimize the overall traffic flow on freeways using multiple ramp metering controls plus its complementary Dynamic Speed Limits (DSLs). An optimal freeway operation can be reached when minimizing the difference between the freeway density and the critical ratio for maximum traffic flow. In this article, a Multi-Agent Reinforcement Learning for Freeways Control (MARL-FWC) system for ramps metering and DSLs is proposed. MARL-FWC introduces a new microscopic framework at the network level based on collaborative Markov Decision Process modeling (Markov game) and an associated cooperative Q-learning algorithm. The technique incorporates payoff propagation (Max-Plus algorithm) under the coordination graphs framework, particularly suited for optimal control purposes. MARL-FWC provides three control designs: fully independent, fully distributed, and centralized; suited for different network architectures. MARL-FWC was extensively tested in order to assess the proposed model of the joint payoff, as well as the global payoff. Experiments are conducted with heavy traffic flow under the renowned VISSIM traffic simulator to evaluate MARL-FWC. The experimental results show a significant decrease in the total travel time and an increase in the average speed (when compared with the base case) while maintaining an optimal traffic flow.
△ Less
Submitted 27 August, 2018;
originally announced August 2018.
-
DialPlate: Enhancing the Detection of Smooth Pursuits Eye Movements Using Linear Regression
Authors:
Heiko Drewes,
Mohamed Khamis,
Florian Alt
Abstract:
We introduce and evaluate a novel approach for detecting smooth pursuit eye movements that increases the number of distinguishable targets and is more robust against false positives. Being natural and calibration-free, Pursuits has been gaining popularity in the past years. At the same time, current implementations show poor performance when more than eight on-screen targets are being used, thus l…
▽ More
We introduce and evaluate a novel approach for detecting smooth pursuit eye movements that increases the number of distinguishable targets and is more robust against false positives. Being natural and calibration-free, Pursuits has been gaining popularity in the past years. At the same time, current implementations show poor performance when more than eight on-screen targets are being used, thus limiting its applicability. Our approach (1) leverages the slope of a regression line, and (2) introduces a minimum signal duration that improves both the new and the traditional detection method. After introducing the approach as well as the implementation, we compare it to the traditional correlation-based Pursuits detection method. We tested the approach up to 24 targets and show that, if accepting a similar error rate, nearly twice as many targets can be distinguished compared to state of the art. For fewer targets, accuracy increases significantly. We believe our approach will enable more robust pursuit-based user interfaces, thus making it valuable for both researchers and practitioners.
△ Less
Submitted 10 July, 2018;
originally announced July 2018.
-
AC/DC: In-Database Learning Thunderstruck
Authors:
Mahmoud Abo Khamis,
Hung Q. Ngo,
XuanLong Nguyen,
Dan Olteanu,
Maximilian Schleich
Abstract:
We report on the design and implementation of the AC/DC gradient descent solver for a class of optimization problems over normalized databases. AC/DC decomposes an optimization problem into a set of aggregates over the join of the database relations. It then uses the answers to these aggregates to iteratively improve the solution to the problem until it converges.
The challenges faced by AC/DC a…
▽ More
We report on the design and implementation of the AC/DC gradient descent solver for a class of optimization problems over normalized databases. AC/DC decomposes an optimization problem into a set of aggregates over the join of the database relations. It then uses the answers to these aggregates to iteratively improve the solution to the problem until it converges.
The challenges faced by AC/DC are the large database size, the mixture of continuous and categorical features, and the large number of aggregates to compute. AC/DC addresses these challenges by employing a sparse data representation, factorized computation, problem reparameterization under functional dependencies, and a data structure that supports shared computation of aggregates.
To train polynomial regression models and factorization machines of up to 154K features over the natural join of all relations from a real-world dataset of up to 86M tuples, AC/DC needs up to 30 minutes on one core of a commodity machine. This is up to three orders of magnitude faster than its competitors R, MadLib, libFM, and TensorFlow whenever they finish and thus do not exceed memory limitation, 24-hour timeout, or internal design limitations.
△ Less
Submitted 15 June, 2018; v1 submitted 20 March, 2018;
originally announced March 2018.
-
Boolean Tensor Decomposition for Conjunctive Queries with Negation
Authors:
Mahmoud Abo Khamis,
Hung Q. Ngo,
Dan Olteanu,
Dan Suciu
Abstract:
We propose an algorithm for answering conjunctive queries with negation, where the negated relations have bounded degree. Its data complexity matches that of the best known algorithms for the positive subquery of the input query and is expressed in terms of the fractional hypertree width and the submodular width. The query complexity depends on the structure of the negated subquery; in general it…
▽ More
We propose an algorithm for answering conjunctive queries with negation, where the negated relations have bounded degree. Its data complexity matches that of the best known algorithms for the positive subquery of the input query and is expressed in terms of the fractional hypertree width and the submodular width. The query complexity depends on the structure of the negated subquery; in general it is exponential in the number of join variables occurring in negated relations yet it becomes polynomial for several classes of queries.
This algorithm relies on several contributions. We show how to rewrite queries with negation on bounded-degree relations into equivalent conjunctive queries with not-all-equal (NAE) predicates, which are a multi-dimensional analog of disequality (not-equal). We then generalize the known color-coding technique to conjunctions of NAE predicates and explain it via a Boolean tensor decomposition of conjunctions of NAE predicates. This decomposition can be achieved via a probabilistic construction that can be derandomized efficiently.
△ Less
Submitted 27 January, 2019; v1 submitted 20 December, 2017;
originally announced December 2017.
-
Learning Models over Relational Data using Sparse Tensors and Functional Dependencies
Authors:
Mahmoud Abo Khamis,
Hung Q. Ngo,
XuanLong Nguyen,
Dan Olteanu,
Maximilian Schleich
Abstract:
Integrated solutions for analytics over relational databases are of great practical importance as they avoid the costly repeated loop data scientists have to deal with on a daily basis: select features from data residing in relational databases using feature extraction queries involving joins, projections, and aggregations; export the training dataset defined by such queries; convert this dataset…
▽ More
Integrated solutions for analytics over relational databases are of great practical importance as they avoid the costly repeated loop data scientists have to deal with on a daily basis: select features from data residing in relational databases using feature extraction queries involving joins, projections, and aggregations; export the training dataset defined by such queries; convert this dataset into the format of an external learning tool; and train the desired model using this tool. These integrated solutions are also a fertile ground of theoretically fundamental and challenging problems at the intersection of relational and statistical data models.
This article introduces a unified framework for training and evaluating a class of statistical learning models over relational databases. This class includes ridge linear regression, polynomial regression, factorization machines, and principal component analysis. We show that, by synergizing key tools from database theory such as schema information, query structure, functional dependencies, recent advances in query evaluation algorithms, and from linear algebra such as tensor and matrix operations, one can formulate relational analytics problems and design efficient (query and data) structure-aware algorithms to solve them.
This theoretical development informed the design and implementation of the AC/DC system for structure-aware learning. We benchmark the performance of AC/DC against R, MADlib, libFM, and TensorFlow. For typical retail forecasting and advertisement planning applications, AC/DC can learn polynomial regression models and factorization machines with at least the same accuracy as its competitors and up to three orders of magnitude faster than its competitors whenever they do not run out of memory, exceed 24-hour timeout, or encounter internal design limitations.
△ Less
Submitted 6 February, 2020; v1 submitted 14 March, 2017;
originally announced March 2017.
-
Juggling Functions Inside a Database
Authors:
Mahmoud Abo Khamis,
Hung Q. Ngo,
Atri Rudra
Abstract:
We define and study the Functional Aggregate Query (FAQ) problem, which captures common computational tasks across a very wide range of domains including relational databases, logic, matrix and tensor computation, probabilistic graphical models, constraint satisfaction, and signal processing. Simply put, an FAQ is a declarative way of defining a new function from a database of input functions.
W…
▽ More
We define and study the Functional Aggregate Query (FAQ) problem, which captures common computational tasks across a very wide range of domains including relational databases, logic, matrix and tensor computation, probabilistic graphical models, constraint satisfaction, and signal processing. Simply put, an FAQ is a declarative way of defining a new function from a database of input functions.
We present "InsideOut", a dynamic programming algorithm, to evaluate an FAQ. The algorithm rewrites the input query into a set of easier-to-compute FAQ sub-queries. Each sub-query is then evaluated using a worst-case optimal relational join algorithm. The topic of designing algorithms to optimally evaluate the classic multiway join problem has seen exciting developments in the past few years. Our framework tightly connects these new ideas in database theory with a vast number of application areas in a coherent manner, showing potentially that a good database engine can be a general-purpose constraint solver, relational data store, graphical model inference engine, and matrix/tensor computation processor all at once.
The InsideOut algorithm is very simple, as shall be described in this paper. Yet, in spite of solving an extremely general problem, its runtime either is as good as or improves upon the best known algorithm for the applications that FAQ specializes to. These corollaries include computational tasks in graphical model inference, matrix/tensor operations, relational joins, and logic. Better yet, InsideOut can be used within any database engine, because it is basically a principled way of rewriting queries. Indeed, it is already part of the LogicBlox database engine, helping efficiently answer traditional database queries, graphical model inference queries, and train a large class of machine learning models inside the database itself.
△ Less
Submitted 9 March, 2017;
originally announced March 2017.
-
What do Shannon-type Inequalities, Submodular Width, and Disjunctive Datalog have to do with one another?
Authors:
Mahmoud Abo Khamis,
Hung Q. Ngo,
Dan Suciu
Abstract:
Recent works on bounding the output size of a conjunctive query with functional dependencies and degree constraints have shown a deep connection between fundamental questions in information theory and database theory. We prove analogous output bounds for disjunctive datalog rules, and answer several open questions regarding the tightness and looseness of these bounds along the way. Our bounds are…
▽ More
Recent works on bounding the output size of a conjunctive query with functional dependencies and degree constraints have shown a deep connection between fundamental questions in information theory and database theory. We prove analogous output bounds for disjunctive datalog rules, and answer several open questions regarding the tightness and looseness of these bounds along the way. Our bounds are intimately related to Shannon-type information inequalities. We devise the notion of a "proof sequence" of a specific class of Shannon-type information inequalities called "Shannon flow inequalities". We then show how such a proof sequence can be interpreted as symbolic instructions guiding an algorithm called "PANDA", which answers disjunctive datalog rules within the time that the size bound predicted. We show that PANDA can be used as a black-box to devise algorithms matching precisely the fractional hypertree width and the submodular width runtimes for aggregate and conjunctive queries with functional dependencies and degree constraints.
Our results improve upon known results in three ways. First, our bounds and algorithms are for the much more general class of disjunctive datalog rules, of which conjunctive queries are a special case. Second, the runtime of PANDA matches precisely the submodular width bound, while the previous algorithm by Marx has a runtime that is polynomial in this bound. Third, our bounds and algorithms work for queries with input cardinality bounds, functional dependencies, and degree constraints.
Overall, our results show a deep connection between three seemingly unrelated lines of research; and, our results on proof sequences for Shannon flow inequalities might be of independent interest.
△ Less
Submitted 23 December, 2023; v1 submitted 7 December, 2016;
originally announced December 2016.
-
Deep learning is competing random forest in computational docking
Authors:
Mohamed Khamis,
Walid Gomaa,
Basem Galal
Abstract:
Computational docking is the core process of computer-aided drug design; it aims at predicting the best orientation and conformation of a small drug molecule when bound to a target large protein receptor. The docking quality is typically measured by a scoring function: a mathematical predictive model that produces a score representing the binding free energy and hence the stability of the resultin…
▽ More
Computational docking is the core process of computer-aided drug design; it aims at predicting the best orientation and conformation of a small drug molecule when bound to a target large protein receptor. The docking quality is typically measured by a scoring function: a mathematical predictive model that produces a score representing the binding free energy and hence the stability of the resulting complex molecule. We analyze the performance of both learning techniques on the scoring power, the ranking power, docking power, and screening power using the PDBbind 2013 database. For the scoring and ranking powers, the proposed learning scoring functions depend on a wide range of features (energy terms, pharmacophore, intermolecular) that entirely characterize the protein-ligand complexes. For the docking and screening powers, the proposed learning scoring functions depend on the intermolecular features of the RF-Score to utilize a larger number of training complexes. For the scoring power, the DL\_RF scoring function achieves Pearson's correlation coefficient between the predicted and experimentally measured binding affinities of 0.799 versus 0.758 of the RF scoring function. For the ranking power, the DL scoring function ranks the ligands bound to fixed target protein with accuracy 54% for the high-level ranking and with accuracy 78% for the low-level ranking while the RF scoring function achieves (46% and 62%) respectively. For the docking power, the DL\_RF scoring function has a success rate when the three best-scored ligand binding poses are considered within 2 Å root-mean-square-deviation from the native pose of 36.0% versus 30.2% of the RF scoring function. For the screening power, the DL scoring function has an average enrichment factor and success rate at the top 1% level of (2.69 and 6.45%) respectively versus (1.61 and 4.84%) respectively of the RF scoring function.
△ Less
Submitted 23 August, 2016;
originally announced August 2016.
-
Computing Join Queries with Functional Dependencies
Authors:
Mahmoud Abo Khamis,
Hung Q. Ngo,
Dan Suciu
Abstract:
Recently, Gottlob, Lee, Valiant, and Valiant (GLVV) presented an output size bound for join queries with functional dependencies (FD), based on a linear program on polymatroids. GLVV bound strictly generalizes the bound of Atserias, Grohe and Marx (AGM) for queries with no FD, in which case there are known algorithms running within AGM bound and thus are worst-case optimal.
A main result of this…
▽ More
Recently, Gottlob, Lee, Valiant, and Valiant (GLVV) presented an output size bound for join queries with functional dependencies (FD), based on a linear program on polymatroids. GLVV bound strictly generalizes the bound of Atserias, Grohe and Marx (AGM) for queries with no FD, in which case there are known algorithms running within AGM bound and thus are worst-case optimal.
A main result of this paper is an algorithm for computing join queries with FDs, running within GLVV bound up to a poly-log factor. In particular, our algorithm is worst-case optimal for any query where the GLVV bound is tight. As an unexpected by-product, our algorithm manages to solve a harder problem, where (some) input relations may have prescribed maximum degree bounds, of which both functional dependencies and cardinality bounds are special cases.
We extend Gottlob et al. framework by replacing all variable subsets with the lattice of closed sets (under the given FDs). This gives us new insights into the structure of the worst-case bound and worst-case instances. While it is still open whether GLVV bound is tight in general, we show that it is tight on distributive lattices and some other simple lattices. Distributive lattices capture a strict superset of queries with no FD and with simple FDs. We also present two simpler algorithms which are also worst-case optimal on distributive lattices within a single-$\log$ factor, but they do not match GLVV bound on a general lattice. Our algorithms are designed based on a novel principle: we turn a proof of a polymatroid-based output size bound into an algorithm.
△ Less
Submitted 6 April, 2016; v1 submitted 31 March, 2016;
originally announced April 2016.
-
FAQ: Questions Asked Frequently
Authors:
Mahmoud Abo Khamis,
Hung Q. Ngo,
Atri Rudra
Abstract:
We define and study the Functional Aggregate Query (FAQ) problem, which encompasses many frequently asked questions in constraint satisfaction, databases, matrix operations, probabilistic graphical models and logic. This is our main conceptual contribution.
We then present a simple algorithm called "InsideOut" to solve this general problem. InsideOut is a variation of the traditional dynamic pro…
▽ More
We define and study the Functional Aggregate Query (FAQ) problem, which encompasses many frequently asked questions in constraint satisfaction, databases, matrix operations, probabilistic graphical models and logic. This is our main conceptual contribution.
We then present a simple algorithm called "InsideOut" to solve this general problem. InsideOut is a variation of the traditional dynamic programming approach for constraint programming based on variable elimination. Our variation adds a couple of simple twists to basic variable elimination in order to deal with the generality of FAQ, to take full advantage of Grohe and Marx's fractional edge cover framework, and of the analysis of recent worst-case optimal relational join algorithms.
As is the case with constraint programming and graphical model inference, to make InsideOut run efficiently we need to solve an optimization problem to compute an appropriate 'variable ordering'. The main technical contribution of this work is a precise characterization of when a variable ordering is 'semantically equivalent' to the variable ordering given by the input FAQ expression. Then, we design an approximation algorithm to find an equivalent variable ordering that has the best 'fractional FAQ-width'. Our results imply a host of known and a few new results in graphical model inference, matrix operations, relational joins, and logic.
We also briefly explain how recent algorithms on beyond worst-case analysis for joins and those for solving SAT and #SAT can be viewed as variable elimination to solve FAQ over compactly represented input functions.
△ Less
Submitted 23 December, 2023; v1 submitted 15 April, 2015;
originally announced April 2015.
-
Sparse Approximation, List Decoding, and Uncertainty Principles
Authors:
Mahmoud Abo Khamis,
Anna C. Gilbert,
Hung Q. Ngo,
Atri Rudra
Abstract:
We consider list versions of sparse approximation problems, where unlike the existing results in sparse approximation that consider situations with unique solutions, we are interested in multiple solutions. We introduce these problems and present the first combinatorial results on the output list size. These generalize and enhance some of the existing results on threshold phenomenon and uncertaint…
▽ More
We consider list versions of sparse approximation problems, where unlike the existing results in sparse approximation that consider situations with unique solutions, we are interested in multiple solutions. We introduce these problems and present the first combinatorial results on the output list size. These generalize and enhance some of the existing results on threshold phenomenon and uncertainty principles in sparse approximations. Our definitions and results are inspired by similar results in list decoding. We also present lower bound examples that bolster our results and show they are of the appropriate size.
△ Less
Submitted 8 August, 2014; v1 submitted 18 April, 2014;
originally announced April 2014.
-
Joins via Geometric Resolutions: Worst-case and Beyond
Authors:
Mahmoud Abo Khamis,
Hung Q. Ngo,
Christopher Ré,
Atri Rudra
Abstract:
We present a simple geometric framework for the relational join. Using this framework, we design an algorithm that achieves the fractional hypertree-width bound, which generalizes classical and recent worst-case algorithmic results on computing joins. In addition, we use our framework and the same algorithm to show a series of what are colloquially known as beyond worst-case results. The framework…
▽ More
We present a simple geometric framework for the relational join. Using this framework, we design an algorithm that achieves the fractional hypertree-width bound, which generalizes classical and recent worst-case algorithmic results on computing joins. In addition, we use our framework and the same algorithm to show a series of what are colloquially known as beyond worst-case results. The framework allows us to prove results for data stored in Btrees, multidimensional data structures, and even multiple indices per table. A key idea in our framework is formalizing the inference one does with an index as a type of geometric resolution; transforming the algorithmic problem of computing joins to a geometric problem. Our notion of geometric resolution can be viewed as a geometric analog of logical resolution. In addition to the geometry and logic connections, our algorithm can also be thought of as backtracking search with memoization.
△ Less
Submitted 23 December, 2016; v1 submitted 2 April, 2014;
originally announced April 2014.
-
Crystallographic Etching of Few-Layer Graphene
Authors:
Sujit S. Datta,
Douglas R. Strachan,
Samuel M. Khamis,
A. T. Charlie Johnson
Abstract:
We demonstrate a method by which few-layer graphene samples can be etched along crystallographic axes by thermally activated metallic nanoparticles. The technique results in long (>1 micron) crystallographic edges etched through to the insulating substrate, making the process potentially useful for atomically precise graphene device fabrication. This advance could enable atomically precise const…
▽ More
We demonstrate a method by which few-layer graphene samples can be etched along crystallographic axes by thermally activated metallic nanoparticles. The technique results in long (>1 micron) crystallographic edges etched through to the insulating substrate, making the process potentially useful for atomically precise graphene device fabrication. This advance could enable atomically precise construction of integrated circuits from single graphene sheets with a wide range of technological applications.
△ Less
Submitted 24 June, 2008;
originally announced June 2008.