×

Succint minimal generators: theoretical foundations and applications. (English) Zbl 1156.68534

Summary: In data mining applications, highly sized contexts are handled what usually results in a considerably large set of frequent itemsets, even for high values of the minimum support threshold. An interesting solution consists then in applying an appropriate closure operator that structures frequent itemsets into equivalence classes, such that two itemsets belong to the same class if they appear in the same sets of objects. Among equivalent itemsets, minimal elements (w.r.t. the number of items) are called Minimal Generators (MGs), while their associated closure is called Closed Itemset (CI), and is the largest one within the corresponding equivalence class. Thus, the pairs – composed by MGs and their associated CIs – make easier localizing each itemset since it is necessarily encompassed by an MG and an CI. In addition, they offer informative implication/association rules, with minimal premises and maximal conclusions, which losslessly represent the entire rule set. These important concepts – MG and CI – were hence at the origin of various works. Nevertheless, the inherent absence of a unique MG associated to a given CI leads to an intra-class combinatorial redundancy that leads an exhaustive storage and impractical use. This motivated an in-depth study towards a lossless reduction of this redundancy.
This study was started by G. Dong, C. Jiang, J. Pei, J. Li and L. Wong [(*) “Mining succinct systems of minimal generators of formal concepts”, Lect. Notes Comput. Sci. 3453, 175–187 (2005)], who introduced the succinct system of minimal generators (SSMG) as an attempt to eliminate the redundancy within this set. In this paper, we give a thorough study of the SSMG as formerly defined in (*). This system will be shown to suffer from some flaws. As a remedy, we introduce a new lossless reduction of the MG set allowing to overcome its limitations. The new SSMG will then be incorporated into the framework of generic bases of association rules. This makes it possible to only maintain succinct and informative rules. After that, we give a thorough formal study of the related inference mechanisms allowing to derive all redundant association rules, starting from the maintained ones. Finally, an experimental evaluation shows the utility of our approach towards eliminating important rate of redundant information.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68P15 Database theory
68T10 Pattern recognition, speech recognition
Full Text: DOI

References:

[1] DOI: 10.1145/380995.381017 · doi:10.1145/380995.381017
[2] DOI: 10.1145/1147234.1147248 · doi:10.1145/1147234.1147248
[3] Berge C., Hypergraphs (1989)
[4] Bonchi F., Journal of Knowledge and Information Systems pp 1–
[5] DOI: 10.1023/A:1021571501451 · doi:10.1023/A:1021571501451
[6] DOI: 10.1016/j.jmaa.2005.05.055 · Zbl 1098.46021 · doi:10.1016/j.jmaa.2005.05.055
[7] T. Calders, C. Rigotti and J. F. Boulicaut, Constraint Based Mining and Inductive Databases, LNAI 3848 (Springer-Verlag, 2005) pp. 64–80.
[8] Ceglar A., ACM Computing Surveys 38
[9] DOI: 10.1007/978-3-642-59830-2 · doi:10.1007/978-3-642-59830-2
[10] Guigues J. L., Mathématiques et Sciences Humaines 24 pp 5–
[11] DOI: 10.1007/s10618-006-0059-1 · doi:10.1007/s10618-006-0059-1
[12] Luxenburger M., Mathématiques, Informatique et Sciences Humaines 29 pp 35–
[13] Maier D., The theory of Relational Databases (1983) · Zbl 0519.68082
[14] Pasquier N., Journal of Intelligent Information Systems 24 pp 25–
[15] DOI: 10.1016/0005-1098(78)90005-5 · Zbl 0418.93079 · doi:10.1016/0005-1098(78)90005-5
[16] Stumme G., Journal on Knowledge and Data Engineering (KDE) 2 pp 189–
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.