×

An inductive database system based on virtual mining views. (English) Zbl 1235.68064

Summary: Inductive databases integrate database querying with database mining. In this article, we present an inductive database system that does not rely on a new data mining query language, but on plain SQL. We propose an intuitive and elegant framework based on virtual mining views, which are relational tables that virtually contain the complete output of data mining algorithms executed over a given data table. We show that several types of patterns and models that are implicitly present in the data, such as itemsets, association rules, and decision trees, can be represented and queried with SQL using a unifying framework. As a proof of concept, we illustrate a complete data mining scenario with SQL queries over the mining views, which is executed in our system.

MSC:

68P15 Database theory
68T05 Learning and adaptive systems in artificial intelligence
68P10 Searching and sorting

Software:

arules; UCI-ml

References:

[1] Abiteboul S, Hull R, Vianu V (1995) Foundations of databases. Addison-Wesley, Reading · Zbl 0848.68031
[2] Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the VLDB international conference on very large data bases, pp 487–499
[3] Blockeel H, Calders T, Fromont E, Goethals B, Prado A (2008a) Mining views: database views for data mining. In: Proceedings of the IEEE ICDE international conference on data engineering, pp 1608–1611 · Zbl 1235.68064
[4] Blockeel H, Calders T, Fromont E, Goethals B, Prado A (2008b) An inductive database prototype based on virtual mining views. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery in databases · Zbl 1235.68064
[5] Blockeel H, Calders T, Fromont E, Goethals B, Prado A, Robardet C (2010a) Practical comparative study of data mining query languages. In: Inductive databases and constraint-based data mining, vol 1. Springer, pp 59–77 · Zbl 1211.68144
[6] Blockeel H, Calders T, Fromont E, Goethals B, Prado A, Robardet C (2010b) Inductive querying with virtual mining views. In: Inductive databases and constraint-based data mining, vol 1. Springer, pp 265–287 · Zbl 1211.68145
[7] Bonchi F, Giannotti F, Lucchese C, Orlando S, Perego R, Trasarti R (2009) A constraint-based querying system for exploratory pattern discovery. Inf Syst 34(1): 3–27 · doi:10.1016/j.is.2008.02.007
[8] Calders T, Goethals B, Prado A (2006a) Integrating pattern mining in relational databases. In: Proceedings of the ECML-PKDD European conference on machine learning and principles and practice of knowledge discovery in databases, pp 454–461
[9] Calders T, Lakshmanan LVS, Ng RT, Paredaens J (2006b) Expressive power of an algebra for data mining. ACM Trans Database Syst 31(4): 1169–1214 · doi:10.1145/1189769.1189770
[10] Chen PP (1976) The entity-relationship model: toward a unified view of data. ACM Trans Database Syst 1: 9–36 · doi:10.1145/320434.320440
[11] Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery in databases, pp 43–52
[12] Fromont E, Blockeel H, Struyf J (2007) Integrating decision tree learning into inductive databases. In: ECML-PKDD workshop on knowledge discovery in inductive databases (KDID) (revised selected papers), pp 81–96
[13] Garcia-Molina H, Widom J, Ullman JD (1999) Database system implementation. Prentice-Hall, Inc, Upper Saddle River
[14] Geerts F, Goethals B, Mielikäinen T (2004) Tiling databases. In: Suzuki E, Arikawa S (eds) Discovery science, vol 3245. Springer, Berlin, pp 278–289 · Zbl 1110.68373
[15] Giannotti F, Manco G, Turini F (2004) Specifying mining algorithms with iterative user-defined aggregates. IEEE Trans Knowl Data Eng 16: 1232–1246 · Zbl 1009.68835 · doi:10.1109/TKDE.2004.64
[16] Goethals B, Bussche JVD (2000) On supporting interactive association rule mining. In: Proceedings of the DAWAK international conference on data warehousing and knowledge discovery, pp 307–316
[17] Gray J, Chaudhuri S, Bosworth A, Layman A, Reichart D, Venkatrao M (1997) Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-total. Data Min Knowl Discov 1: 29–53 · doi:10.1023/A:1009726021843
[18] Hahsler M, Grün B, Hornik K (2007) arules: mining association rules and frequent itemsets. SIGKDD Explor 2: 0–4
[19] Han J, Fu Y, Wang W, Koperski K, Zaiane O (1996) DMQL: a data mining query language for relational databases. In: ACM SIGMOD workshop on data mining and knowledge discovery (DMKD)
[20] Harinarayan V, Rajaraman A, Ullman JD (1996) Implementing data cubes efficiently. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 205–216
[21] Imielinski T, Mannila H (1996) A database perspective on knowledge discovery. Commun ACM 39: 58–64 · doi:10.1145/240455.240472
[22] Imielinski T, Virmani A (1999) Msql: a query language for database mining. Data Min Knowl Discov 3(4): 373–408 · doi:10.1023/A:1009816913055
[23] Johnson T, Lakshmanan LVS, Ng RT (2000) The 3w model and algebra for unified data mining. In: Proceedings pf the VLDB international conference on very large data bases. Morgan Kaufmann, pp 21–32
[24] Meo R, Psaila G, Ceri S (1998) An extension to sql for mining association rules. Data Min Knowl Discov 2(2): 195–224 · doi:10.1023/A:1009774406717
[25] Mitchell TM (1997) Machine learning. McGraw-Hill, New York · Zbl 0913.68167
[26] Newman D et al (1998) UCI Repository of machine learning databases [ http://www.ics.uci.edu/\(\sim\)mlearn/MLRepository.html ]. Irvine, CA: University of California, Department of Information and Computer Science
[27] Nijssen S, Raedt LD (2007) Iql: a proposal for an inductive query language. In: ECML-PKDD workshop on knowledge discovery in inductive databases (KDID) (revised selected papers), pp 189–207
[28] Prado A (2009) An inductive database system based on virtual mining views. PhD thesis, University of Antwerp, Belgium
[29] Ramakrishnan R, Gehrke J (2002) Database management systems, 3rd edn. McGraw-Hill Science/Engineering/Math, New York · Zbl 1058.68050
[30] Tang ZH, MacLennan J (2005) Data mining with SQL Server 2005. Wiley, New York
[31] Wang H, Zaniolo C (2001) Nonmonotonic reasoning in ldl++. In: Minker J (ed) Logic-based artificial intelligence. Kluwer Academic Publishers, pp 523–544 · Zbl 0979.68041
[32] Wang H, Zaniolo C (2003) Atlas: a native extension of sql for data mining. In: Proceedings of the SIAM international conference on data mining, pp 130–144
[33] Wicker J, Richter L, Kessler K, Kramer S (2008) Sinbad and siql: an inductive database and query language in the relational model. In: Proceedings of the ECML-PKDD European conference on machine learning and principles and practice of knowledge discovery in databases, pp 690–694
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.