×

A time-efficient breadth-first level-wise lattice-traversal algorithm to discover rare itemsets. (English) Zbl 1294.68074

Summary: In this paper we face the problem of searching for rare itemsets. A main issue regards the strategy to adopt in exploring the power set lattice. Assuming a power set lattice with full set at the top and empty set at the bottom, the most of the algorithms adopt a bottom-up exploration, i.e. moving from smaller to larger sets. Although this approach is advantageous in the case of frequent itemsets, it might not be worth being used for rare itemsets, as they occur on the top of the lattice. We propose Rarity, a top-down breadth-first level-wise algorithm. Experimental results and comparisons are illustrated in order to provide a quantitative characterization of algorithm performances and complexity. Application to some UCI benchmark and real world datasets is provided. An algorithm parallelization is outlined. Experiments showed that this approach takes advantage of finding all rare non-zero itemsets in less time than other solutions, at expenses of higher memory demand.

MSC:

68P20 Information storage and retrieval of data

Software:

LCM; RP-Tree; UCI-ml
Full Text: DOI

References:

[1] Adda M, Wu L, Feng Y (2007) Rare itemset mining. In: Proceedings of 6th International Conference on Machine Learning and Applications, ICMLA ’07. IEEE Computer Society, Washington, DC, pp 73–80
[2] Agrawal R, Imieliński T, Swami A (1993) Database mining: a performance perspective. IEEE Trans Knowl Data Eng 5(6):914–925 · doi:10.1109/69.250074
[3] Agrawal R, Imieliński T, Swami A (1993) Mining association rules between sets of items in large databases. ACM SIGMOD Int Conf Manag Data 22:207–216 · doi:10.1145/170036.170072
[4] Agrawal R, Mannila H, Srikant R, Toivonen H, Inkeri Verkamo A (1996) Fast discovery of association rules. In: Advances in knowledge discovery and data mining. AAAI/MIT Press, Cambridge
[5] Agrawal R, Shafer JC (1996) Parallel mining of association rules. IEEE Trans Knowl Data Eng 8(6):962–969 · doi:10.1109/69.553164
[6] Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: 20th VLDB Conference
[7] Bastide Y, Taouil R, Pasquier N, Stumme G, Lakhal L (2000) Mining frequent patterns with counting inference. SIGKDD Explor Newsl 2(2):66–75 · Zbl 0983.68511 · doi:10.1145/380995.381017
[8] Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: SIGMOD ’97: Proceedings of the 1997 ACM SIGMOD International Conference on Management of Data. ACM, New York, pp 255–264
[9] Burdick D, Calimlim M, Gehrke J (2001) Mafia: a maximal frequent itemset algorithm for transactional databases. In: Proceedings of the 17th International Conferences on Data Engineering. IEEE Computer Society, Washington, DC, pp 443–452
[10] Forina M (1991) Wine dataset. http://archive.ics.uci.edu/ml/datasets/wine . Accessed 5 Nov 2012
[11] Haglin DJ, Manning AM (2007) On minimal infrequent itemset mining. In: DMIN. CSREA Press, Las Vegas, pp 141–147
[12] Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. In: Mannila H (ed) Data mining and knowledge discovery. Kluwer, New York, pp 53–87
[13] Koh YS, Rountree N (2005) Finding sporadic rules using apriori-inverse. In: PAKDD. Springer, New York, pp 97–106
[14] Koh YS, Rountree N, O’Keefe RA (2008) Mining interesting imperfectly sporadic rules. Knowl Inf Syst 14(2):179–196 · doi:10.1007/s10115-007-0074-6
[15] Liu B, Hsu W, Ma Y (1999) Mining association rules with multiple minimum supports. In: KDD ’99: Proceedings of 5th ACM SIGKDD International Conferences on Knowledge Discovery and Data Mining. ACM, New York, pp 337–341
[16] Mannila H, Toivonen H, Verkamo I (1994) Efficient algorithms for discovering association rules. In: KDD ’94: Proceedings of the AAAI Workshop on Knowledge Discovery in Databases. AAAI Press, Seattle, pp 181–192
[17] Nakai K (1996a) Ecoli dataset. http://archive.ics.uci.edu/ml/datasets/ecoli . Accessed 5 Nov 2012
[18] Nakai K (1996b) Yeast dataset. http://archive.ics.uci.edu/ml/datasets/yeast . Accessed 5 Nov 2012
[19] Park JS, Chen M-S, Yu PS (1995) Efficient parallel data mining for association rules. In: CIKM ’95: Proceedings of 4th International Conference on Information and Knowledge Management. ACM, New York, pp 31–36
[20] Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Closed set based discovery of small covers for association rules. In: Proceedings 15emes Journees Bases de Donnees Avancees. BDA, pp 361–381
[21] Pei J, Han J, Lu H, Nishio S, Tang S, Yang D (2001) H-mine: hyper-structure mining of frequent patterns in large databases. In: ICDM ’01: Proceedings of the 2001 IEEE International Conferences on Data Mining. Washington, DC, pp 441–448
[22] Piatetsky-Shapiro G, Frawley WJ (eds) (1991) Knowledge discovery in databases. AAAI/MIT Press, Cambridge
[23] Savasere A, Omiecinski E, Navathe SB (1995) An efficient algorithm for mining association rules in large databases. In VLDB ’95: Proceedings of 21st International Conferences on Very Large Data Bases. Morgan Kaufmann, San Francisco, pp 432–444
[24] Shenoy P, Haritsa JR, Sudarshan S, Bhalotia G, Bawa M, Shah D (2000) Turbo-charging vertical mining of large databases. SIGMOD Rec 29(2):22–33 · doi:10.1145/335191.335376
[25] Song M, Rajasekaran S (2006) A transaction mapping algorithm for frequent itemsets mining. IEEE Trans Knowl Data Eng 18(4):472–481 · doi:10.1109/TKDE.2006.1599386
[26] Szathmary L, Napoli A, Kuznetsov SO (2007) ZART: a multifunctional itemset mining algorithm. In: Proceedings of the 5th International Conferences on Concept Lattices and Their Applications (CLA ’07). Montpellier, pp 26–37
[27] Szathmary L, Napoli A, Valtchev P (2007) Towards rare itemset mining. In: ICTAI ’07: Proceedings of 19th IEEE International Conferences on Tools with Artificial Intelligence. Washington, DC, pp 305–312
[28] Troiano L, Scibelli G, Birtolo C (2009) A fast algorithm for mining rare itemsets. In: ISDA’09, pp 1149–1155
[29] Tsang S, Koh YS, Dobbie G (2011) Rp-tree: rare pattern tree mining. In: Proceedings of CLA, pp 277–288
[30] Uno T, Asai T, Uchida Y, Arimura H (2003) Lcm: an efficient algorithm for enumerating frequent closed item sets. In: FIMI03: Proceedings of Workshop on Frequent Itemset Mining Implementations
[31] Uno T, Kiyomi M, Arimura H (2004) Lcm ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: FIMI ’04, Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations
[32] Uno T, Kiyomi M, Arimura H (2005) Lcm ver. 3: collaboration of array, bitmap and prefix tree for frequent itemset mining. In: Proceedings of 1st International Workshop on Open Source Data Mining: Frequent Pattern Mining Implementations, ACM, New York, pp 77–86
[33] Weiss GM (2004) Mining with rarity: a unifying framework. SIGKDD Explor Newsl 6(1):7–19 · doi:10.1145/1007730.1007734
[34] Yang G (2004) The complexity of mining maximal frequent itemsets and maximal frequent patterns. In: KDD ’04: Proceedings of 10th ACM SIGKDD International Conferences on Knowledge Discovery and Data Mining. New York, pp 344–353
[35] Yun H, Ha D, Hwang B, Ryu KH (2003) Mining association rules on significant rare data using relative support. J Syst Softw 67(3):181–191 · doi:10.1016/S0164-1212(02)00128-0
[36] Zaki MJ, Gouda K (2003) Fast vertical mining using diffsets. In: KDD ’03: Proceedings of 9th ACM SIGKDD International Conferences on Knowledge Discovery and Data Mining. New York, pp 326–335
[37] Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. Technical report, Rochester
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.