×

Using trees to mine multirelational databases. (English) Zbl 1235.68067

Summary: This paper proposes a new approach to mine multirelational databases. Our approach is based on the representation of multirelational databases as sets of trees, for which we propose two alternative representation schemes. Tree mining techniques can thus be applied as the basis for multirelational data mining techniques, such as multirelational classification or multirelational clustering. We analyze the differences between identifying induced and embedded tree patterns in the proposed tree-based representation schemes and we study the relationships among the sets of tree patterns that can be discovered in each case. This paper also describes how these frequent tree patterns can be used, for instance, to mine association rules in multirelational databases.

MSC:

68P15 Database theory
68T05 Learning and adaptive systems in artificial intelligence

Software:

CrossMine; Warmr
Full Text: DOI

References:

[1] Abe K, Kawasoe S, Asai T, Arimura H, Arikawa S (2002) Efficient substructure discovery from large semi-structured data. In: Proceedings of the 2nd SIAM international conference on data mining, pp 158–174 · Zbl 1020.68521
[2] Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th international conference on very large data bases, 12–15 Sept, pp 487–499
[3] Bayardo RJ (2004) The hows, whys, and whens of constraints in itemset and rule discovery. In: Constraint-based mining and inductive databases, lecture notes in artificial intelligence, pp 1–13
[4] Berzal F, Blanco I, Sánchez D, Vila MA (2002) Measuring the accuracy and interest of association rules: a new framework. Intell Data Anal 6(3): 221–235 · Zbl 1088.68690
[5] Berzal F, Cubero JC, Sánchez D, Serrano JM (2004) ART: a hybrid classification model. Mach Learn 54(1): 67–92 · Zbl 1075.68631 · doi:10.1023/B:MACH.0000008085.22487.a6
[6] Blockeel H, Raedt LD (1998) Top-down induction of first-order logical decision trees. Artif Intell 101 (1–2): 285–297 · Zbl 0909.68034 · doi:10.1016/S0004-3702(98)00034-4
[7] Booch G, Rumbaugh J, Jacobson I (2005) The unified modeling language user guide, 2nd edn. Addison-Wesley Professional, New York
[8] Chi Y, Yang Y, Muntz RR (2003) Indexing and mining free trees. In: Proceedings of the 3rd IEEE international conference on data mining, pp 509–512
[9] Chi Y, Muntz RR, Nijssen S, Kok JN (2005) Frequent subtree mining–an overview. Fundam Inform 66(1–2): 161–198 · Zbl 1096.68044
[10] Codd EF (1990) The relational model for database management, version 2. Addison-Wesley, New York · Zbl 0809.68056
[11] De Knijf J (2006) FAT-miner: mining frequent attribute trees. Tech. Rep. UU-CS-2006-053, Department of Information and Computing Sciences, Utrecht University
[12] De Knijf J (2007) FAT-miner: mining frequent attribute trees. In: Proceedings of the 2007 ACM symposium on applied computing. ACM, New York, pp 417–422
[13] Džeroski S (2003) Multi-relational data mining: an introduction. SIGKDD Explor Newsl 5(1): 1–16 · doi:10.1145/959242.959245
[14] Fagin R, Mendelzon AO, Ullman JD (1982) A simplied universal relation assumption and its properties. ACM Trans Database Syst 7: 343–360 · Zbl 0488.68069 · doi:10.1145/319732.319735
[15] Garcia-Molina H, Ullman JD, Widom J (2008) Database systems: the complete book. Pearson Education, Boston
[16] Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1): 53–87 · doi:10.1023/B:DAMI.0000005258.31418.83
[17] Jimenez A, Berzal F, Cubero JC (2010a) Frequent tree pattern mining: a survey. Intell Data Anal 14(6): 603–622
[18] Jimenez A, Berzal F, Cubero JC (2010b) POTMiner: mining ordered, unordered, and partially-ordered trees. Knowl Inform Syst 23(2): 199–224 · doi:10.1007/s10115-009-0213-3
[19] King RD, Srinivasan A, Dehaspe L (2001) Warmr: a data mining tool for chemical data. J Comput-Aided Mol Des 15(2): 173–181 · doi:10.1023/A:1008171016861
[20] Krogel MA, Wrobel S (2003) Facets of aggregation approaches to propositionalization. In: Horvath T, Yamamoto A (eds) Work-in-progress track at the thirteenth international conference on inductive logic programming
[21] Lee AJT, Wang CS (2007) An efficient algorithm for mining frequent inter-transaction patterns. Inform Sci 177(17): 3453–3476 · doi:10.1016/j.ins.2007.03.007
[22] Leiva HA, Gadia S, Dobbs D (2002) MRDTL: a multi-relational decision tree learning algorithm. In: Proceedings of the 13th international conference on inductive logic programming. Springer-Verlag, pp 38–56
[23] Maier D, Ullman JD (1983) Maximal objects and the semantics of universal relation databases. ACM Trans Database Syst 8: 1–14 · Zbl 0536.68081 · doi:10.1145/319830.319831
[24] Maier D, Ullman JD, Vardi MY (1984) On the foundations of the universal relation model. ACM Trans Database Syst 9: 283–308 · Zbl 0563.68077 · doi:10.1145/329.318580
[25] McGovern A, Hiers NC, Collier M, II DJG, Brown RA (2008) Spatiotemporal relational probability trees: an introduction. In: Proceedings of the 8th IEEE international conference on data mining. IEEE Computer Society, pp 935–940
[26] Neville J, Jensen D, Friedland L, Hay M (2003) Learning relational probability trees. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 625–630
[27] Paterson J, Edlich S, Hörning H, Hörning R (2006) The definitive guide to db4o. Apress, New York
[28] Pei J, Han J (2002) Constrained frequent pattern mining: a pattern-growth view. SIGKDD Explor Newsl 4(1): 31–39 · doi:10.1145/568574.568580
[29] Perlich C, Provost F (2006) Distribution-based aggregation for relational learning with identifier attributes. Mach Learn 62: 65–105 · Zbl 1470.68158 · doi:10.1007/s10994-006-6064-1
[30] Silberschatz A, Korth HF, Sudarshan S (2001) Database systems concepts. McGraw-Hill, New York
[31] Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of the 3rd international conference of knowledge discovery and data mining, pp 63–73
[32] Srinivasan A, Muggleton SH, King R, Sternberg M (1994) Mutagenesis: ILP experiments in a non-determinate biological domain. In: Proceedings of the 4th international workshop on inductive logic programming, vol 237 of GMD-Studien, pp 217–232
[33] Tung AKH, Lu H, Han J, Feng L (2003) Efficient mining of intertransaction association rules. IEEE Trans Knowl Data Eng 15(1): 43–56 · doi:10.1109/TKDE.2003.1161581
[34] Turmeaux T, Salleb A, Vrain C, Cassard D (2003) Learning characteristic rules relying on quantified paths. In: Proceedings of the 7th European conference on principles and practice of knowledge discovery in databases, pp 471–482
[35] Ullman JD (1988) Principles of database and knowledge-base systems, vol I: classical database systems. Computer Science Press Inc., New York
[36] Ullman JD (1990) Principles of database and knowledge-base systems, vol II: the new technologies. W. H. Freeman & Co., New York
[37] Wang C, Hong M, Pei J, Zhou H, Wang W, Shi B (2004) Efficient pattern-growth methods for frequent tree pattern mining. In: Proceedings of the 8th Pacific-Asia conference on knowledge discovery and data mining. Lecture Notes in Computer Science, vol 3056, Springer, pp 441–451
[38] Xiao Y, Yao JF, Li Z, Dunham MH (2003) Efficient data mining for maximal frequent subtrees. In: Proceedings of the 3rd IEEE international conference on data mining, pp 379–386
[39] Yin X, Han J, Yang J, Yu PS (2004) CrossMine: efficient classification across multiple database relations. In: Proceedings of the 20th international conference on data engineering, pp 399–410 · Zbl 1172.68458
[40] Yin X, Han J, Yu PS (2005) Cross-relational clustering with user’s guidance. In: Proceedings of the 12th international conference on knowledge discovery and data mining, pp 344–353
[41] Zaki MJ (2005a) Efficiently mining frequent embedded unordered trees. Fundam Inform 66(1–2): 33–52 · Zbl 1098.68579
[42] Zaki MJ (2005b) Efficiently mining frequent trees in a forest: algorithms and applications. IEEE Trans Knowl Data Eng 17(8): 1021–1035 · doi:10.1109/TKDE.2005.125
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.