×

RQL: a query language for rule discovery in databases. (English) Zbl 1356.68051

Summary: Promoting declarative approaches in data mining is a long standing theme, the main idea being to simplify as much as possible the way data analysts interact with their data. This paper goes into this direction by proposing a well-founded logical query language, Safe\(\mathcal{RL}\), allowing the expression of a wide variety of rules to be discovered against a database. By rules, we mean statements of the form “if \(\dots\) then \(\dots\)”, as defined in logics for “implications” between boolean variables. As a consequence, Safe\(\mathcal{RL}\) extends and generalizes functional dependencies to new and unexpected rules. We provide a query rewriting technique and a constructive proof of the main query equivalence theorem, leading to an efficient query processing technique. From Safe\(\mathcal{RL}\), we have devised RQL, a user-friendly SQL-like query language. We have shown how a tight integration can be performed on top of any relational database management system. Every RQL query turns out to be seen as a query processing problem, instead of a particular rule mining problem. This approach has been implemented and experimented on sensor network data. A web prototype has been released and is freely available (http://rql.insa-lyon.fr). Data analysts can upload a sample of their data, write their own RQL queries and get answers to know whether or not a rule holds (if not, a counterexample from the database is displayed) and much more.

MSC:

68P15 Database theory
68T30 Knowledge representation

Software:

UCI-ml; TANE
Full Text: DOI

References:

[1] Abiteboul, Serge; Hull, Richard; Vianu, Victor, Foundations of Databases (1995), Addison-Wesley · Zbl 0848.68031
[2] Adaricheva, Kira V.; Nation, James B., On implicational bases of closure systems with unique critical sets, Discrete Appl. Math., 162, 51-69 (2014) · Zbl 1341.06003
[3] Adaricheva, Kira V.; Nation, James B.; Okimoto, Gordon; Adarichev, Vyacheslav; Amanbekkyzy, Adina; Sarkar, Shuchismita; Sailanbayev, Alibek; Seidalin, Nazar; Alibek, Kenneth, Measuring the implications of the d-basis in analysis of data in biomedical studies, (Formal Concept Analysis - 13th International Conference, ICFCA 2015, Proceedings. Formal Concept Analysis - 13th International Conference, ICFCA 2015, Proceedings, Nerja, Spain, June 23-26, 2015 (2015)), 39-57 · Zbl 1312.68178
[4] Adaricheva, Kira V.; Nation, James B.; Rand, Robert, Ordered direct implicational basis of a finite closure system, Discrete Appl. Math., 161, 6, 707-723 (2013) · Zbl 1288.06007
[5] Agier, Marie; Froidevaux, Christine; Petit, Jean-Marc; Renaud, Yoan; Wijsen, Jef, On Armstrong-compliant Logical Query Languages, (Fletcher, George H. L.; Staworko, Slawek, 4th International Workshop on Logic in Databases (LID 2011) in Conjunction with EDBT/ICDT Conference (March 2011), ACM), 33-40
[6] Agier, Marie; Petit, Jean-Marc; Suzuki, Einoshin, Unifying framework for rule semantics: application to gene expression data, Fund. Inform., 78, 4, 543-559 (2007) · Zbl 1119.68144
[7] Arimura, Hiroki; Uno, Takeaki, Polynomial-delay and polynomial-space algorithms for mining closed sequences, graphs, and pictures in accessible set systems, (SDM (2009)), 1088-1099
[8] Armstrong, William Ward, Dependency structures of data base relationships, (Proceedings of the IFIP Congress (1974)), 580-583 · Zbl 0296.68038
[9] Babin, Mikhail A.; Kuznetsov, Sergei O., Computing premises of a minimal cover of functional dependencies is intractable, Discrete Appl. Math., 161, 6, 742-749 (2013) · Zbl 1263.68046
[10] Baudinet, Marianne; Chomicki, Jan; Wolper, Pierre, Constraint-generating dependencies, J. Comput. System Sci., 59, 1, 94-115 (1999) · Zbl 0939.68030
[11] Bertet, Karell; Monjardet, Bernard, The multiple facets of the canonical direct unit implicational basis, Theoret. Comput. Sci., 411, 22-24, 2155-2166 (2010) · Zbl 1209.68187
[12] Blockeel, Hendrik; Calders, Toon; Fromont, Elisa; Goethals, Bart; Prado, Adriana; Robardet, Céline, A practical comparative study of data mining query languages, (Džeroski, Sašo; Goethals, Bart; Panov, Panče, Inductive Databases and Constraint-Based Data Mining (2010), Springer: Springer New York), 59-77 · Zbl 1211.68144
[13] Blockeel, Hendrik; Calders, Toon; Fromont, Élisa; Goethals, Bart; Prado, Adriana; Robardet, Céline, An inductive database system based on virtual mining views, Data Min. Knowl. Discov., 24, 1, 247-287 (2012) · Zbl 1235.68064
[14] Bohannon, Philip; Fan, Wenfei; Geerts, Floris; Jia, Xibei; Kementsietsidis, Anastasios, Conditional functional dependencies for data cleaning, (Proceedings of the 23rd International Conference on Data Engineering, ICDE ’07 (2007)), 746-755
[15] Toon, Calders; Wijsen, Jef, On monotone data mining languages, (Proceedings of the 8th International Workshop on Database Programming Languages, DBPL ’01 (2001)), 119-132 · Zbl 1098.68559
[16] Caspard, Nathalie; Monjardet, Bernard, The lattices of closure systems, closure operators, and implicational systems on a finite set: a survey, Discrete Appl. Math., 127, 2, 241-269 (2003) · Zbl 1026.06008
[17] Chardin, Brice; Coquery, Emmanuel; Gouriou, Benjamin; Pailloux, Marie; Petit, Jean-Marc, Query rewriting for rule mining in databases, (Crémilleux, Bruno; De Raedt, Luc; Frasconi, Paolo; Guns, Tias, Languages for Data Mining and Machine Learning (LML) Workshop@ECML/PKDD 2013 (September 2013)), 1-16
[18] Chardin, Brice; Coquery, Emmanuel; Pailloux, Marie; Petit, Jean-Marc, RQL: an SQL-like query language for discovering meaningful rules (demo), (IEEE ICDM 2014, Shengzen, China (December 2014)) · Zbl 1356.68051
[19] Chu, Xu; Ilyas, Ihab F.; Papotti, Paolo, Discovering denial constraints, Proc. VLDB Endow., 6, 13, 1498-1509 (August 2013)
[20] Demetrovics, János; Duc Thi, Vu, Some remarks on generating Armstrong and inferring functional dependencies relation, Acta Cybernet., 12, 2, 167-180 (1995) · Zbl 0840.68049
[21] Fan, Wenfei; Geerts, Floris, Uniform dependency language for improving data quality, IEEE Data Eng. Bull., 34, 3, 34-42 (2011)
[22] Fang, Lujun; LeFevre, Kristen, Splash: ad-hoc querying of data and statistical models, (Proceedings of the 13th International Conference on Extending Database Technology, EDBT ’10 (2010)), 275-286
[23] Flach, Peter A.; Savnik, Iztok, Database dependency discovery: a machine learning approach, AI Commun., 12, 3, 139-160 (1999)
[24] Flouvat, Frédéric; De Marchi, Fabien; Petit, Jean-Marc, The iZi project: easy prototyping of interesting pattern mining algorithms, (New Frontiers in Applied Data Mining. New Frontiers in Applied Data Mining, Lecture Notes in Comput. Sci., vol. 5669 (2010), Springer), 1-15
[25] Fredman, Michael L.; Khachiyan, Leonid, On the complexity of dualization of monotone disjunctive normal forms, J. Algorithms, 21, 3, 618-628 (1996) · Zbl 0864.68038
[26] Ganter, Bernhard; Wille, Rudolf, Formal Concept Analysis (1999), Springer · Zbl 0909.06001
[27] Golab, Lukasz; Karloff, Howard J.; Korn, Flip; Saha, Avishek; Srivastava, Divesh, Sequential dependencies, Proc. VLDB Endow., 2, 1, 574-585 (2009)
[28] Golab, Lukasz; Karloff, Howard J.; Korn, Flip; Srivastava, Divesh, Data auditor: exploring data quality and semantics using pattern tableaux, Proc. VLDB Endow., 3, 2, 1641-1644 (2010)
[29] Gottlob, Georg; Libkin, Leonid, Investigations on Armstrong relations, dependency inference, and excluded functional dependencies, Acta Cybernet., 9, 4, 385-402 (1990) · Zbl 0727.68026
[30] Guigues, Jean-Louis; Duquenne, Vincent, Familles minimales d’implications informatives résultant d’un tableau de données binaires, Math. Sci. Hum., 24, 95, 5-18 (1986) · Zbl 1504.68217
[31] Guns, Tias; Nijssen, Siegfried; De Raedt, Luc, Itemset mining: a constraint programming perspective, Artificial Intelligence, 175, 12-13, 1951-1983 (2011) · Zbl 1353.68233
[32] Huhtala, Ykä; Kärkkäinen, Juha; Porkka, Pasi; Toivonen, Hannu, TANE: an efficient algorithm for discovering functional and approximate dependencies, Comput. J., 42, 2, 100-111 (1999) · Zbl 0944.68054
[33] Imielinski, Tomasz; Mannila, Heikki, A database perspective on knowledge discovery, Commun. ACM, 39, 11, 58-64 (1996)
[34] Intille, Stephen S.; Larson, Kent; Tapia, Emmanuel Munguia; Beaudin, Jennifer S.; Kaushik, Pallavi; Nawyn, Jason; Rockinson, Randy, Using a live-in laboratory for ubiquitous computing research, (PERVASIVE ’06 (2006)), 349-365
[35] Koudas, Nick; Saha, Avishek; Srivastava, Divesh; Venkatasubramanian, Suresh, Metric functional dependencies, (ICDE (2009)), 1275-1278
[36] Lichman, Moshe, UCI Machine Learning Repository (2016)
[37] Lopes, Stéphane; Petit, Jean-Marc; Lakhal, Lotfi, Efficient discovery of functional dependencies and Armstrong relations, (EDBT 2000 (2000)), 350-364
[38] Lopes, Stéphane; Petit, Jean-Marc; Lakhal, Lotfi, Functional and approximate dependency mining: database and FCA points of view, J. Exp. Theor. Artif. Intell., 14, 2-3, 93-114 (2002) · Zbl 1024.68027
[39] Mannila, Heikki; Räihä, Kari-Jouko, Algorithms for inferring functional dependencies from relations, Data Knowl. Eng., 12, 1, 83-99 (1994) · Zbl 0805.68034
[40] Mannila, Heikki; Toivonen, Hannu, Levelwise search and borders of theories in knowledge discovery, Data Min. Knowl. Discov., 1, 3, 241-258 (1997)
[42] Medina, Raoul; Nourine, Lhouari, A unified hierarchy for functional dependencies, conditional functional dependencies and association rules, (ICFCA. ICFCA, Lecture Notes in Comput. Sci., vol. 5548 (2009), Springer), 98-113 · Zbl 1248.68184
[43] Meo, Rosa; Psaila, Giuseppe; Ceri, Stefano, An extension to SQL for mining association rules, Data Min. Knowl. Discov., 2, 2, 195-224 (1998)
[44] Métivier, Jean-Philippe; Boizumault, Patrice; Crémilleux, Bruno; Khiari, Mehdi; Loudni, Samir, A constraint-based language for declarative pattern discovery, (Declarative Pattern Mining Workshop, ICDM 2011 (2011)), 1112-1119
[45] Murakami, Keisuke; Uno, Takeaki, Efficient algorithms for dualizing large-scale hypergraphs (2011), CoRR · Zbl 1288.05188
[46] Négrevergne, Benjamin; Termier, Alexandre; Rousset, Marie-Christine; Méhaut, Jean-François, Para Miner: a generic pattern mining algorithm for multi-core architectures, Data Min. Knowl. Discov., 28, 3, 593-633 (2014) · Zbl 1294.68120
[47] Nourine, Lhouari; Petit, Jean-Marc, Extending set-based dualization: application to pattern mining, (ECAI 2012 (August 2012), IOS Press) · Zbl 1327.68281
[48] Nourine, Lhouari; Petit, Jean-Marc, Extended dualization: application to maximal pattern mining, Theoret. Comput. Sci., 618, 107-121 (2016) · Zbl 1335.68081
[49] Ordonez, Carlos; Pitchaimalai, Sasi K., One-pass data mining algorithms in a DBMS with UDFs, (SIGMOD Conference (2011)), 1217-1220
[50] Papenbrock, Thorsten; Bergmann, Tanja; Finke, Moritz; Zwiener, Jakob; Naumann, Felix, Data profiling with metanome, Proc. VLDB Endow., 8, 12, 1860-1871 (2015)
[51] Papenbrock, Thorsten; Ehrlich, Jens; Marten, Jannik; Neubert, Tommy; Rudolph, Jan-Peer; Schönberg, Martin; Zwiener, Jakob; Naumann, Felix, Functional dependency discovery: an experimental evaluation of seven algorithms, Proc. VLDB Endow., 8, 10, 1082-1093 (2015)
[52] Vincent, Jonathan; Martre, Pierre; Gouriou, Benjamin; Ravel, Catherine; Dai, Zhanwu; Petit, Jean-Marc; Pailloux, Marie, RulNet: a web-oriented platform for regulatory network inference, application to wheat -omics data, PLoS ONE, 10, 5, 20 (May 2015)
[53] Wijsen, Jef, Temporal dependencies, (Encyclopedia of Database Systems (2009)), 2960-2966
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.