×

Trie: An alternative data structure for data mining algorithms. (English) Zbl 1074.68013

Summary: Frequent itemset mining is one of the most important data mining fields. Most algorithms are APRIORI based, where hash-trees are used extensively to speed up the search for itemsets. In this paper, we demonstrate that a version of the trie data structure outperforms hash-trees in some data mining applications.
Tries appear to offer simpler and scalable algorithms which turned out to be faster for lower support thresholds. To back up our claims, we present test results based on real-life datasets.

MSC:

68P05 Data structures
68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] Agrawal, R.; Imielinski, T.; Swami, A., Mining association rules between sets of items in large databases, (Proc. of the ACM SIGMOD Conference on Management of Data (1993)), 207-216
[2] Agrawal, R.; Srikant, R., Fast algorithms for mining association rules, (The International Conference on Very Large Databases (1994)), 487-499
[3] Chen, M. S.; Park, J. S.; Yu, P. S., An effective hash based algorithm for mining association rules, (Proc. of ACM SIGMOD International Conference Management of Data (1995))
[4] Brin, S.; Motwani, R.; Ullman, J. D.; Tsur, S., Dynamic itemset counting and implication rules for market basket data, SIGMOD Record (ACM Special Interest Group on Management of Data), 26, 2, 255-264 (1997)
[5] Sarasere, A.; Omiecinsky, E.; Navathe, S., An efficient algorithm for mining association rules in large databases, (Proc. 21st International Conference on Very Large Databases (VLDB). Proc. 21st International Conference on Very Large Databases (VLDB), Zurich, Switzerland. Proc. 21st International Conference on Very Large Databases (VLDB). Proc. 21st International Conference on Very Large Databases (VLDB), Zurich, Switzerland, Gatech Technical Report No. GIT-CC-95-04 (1995))
[6] Toivonen, H., Sampling large databases for association rules, The VLDB Journal, 134-145 (1996)
[7] Srikant, R.; Agrawal, R., Mining generalized association rules, (Proc. of the \(21^{st}\) International Conference on Very Large Databases (VLDB). Proc. of the \(21^{st}\) International Conference on Very Large Databases (VLDB), Zurich, Switzerland (1995))
[8] Han, J.; Fu, Y., Discovery of multiple-level association rules from large databases, (Proc. of the \(21^{st}\) International Conference on Very Large Databases (VLDB). Proc. of the \(21^{st}\) International Conference on Very Large Databases (VLDB), Zurich, Switzerland (1995))
[9] Fu, Y., Discovery of multiple-level rules from large databases (1996)
[10] Cheung, D. W.L.; Han, J.; Ng, V.; Wong, C. T., Maintenance of discovered association rules in large databases: An incremental updating technique, ICDE, 106-114 (1996)
[11] Cheung, D. W.L.; Lee, S. D.; Kao, B., A general incremental technique for maintaining discovered association rules, Database Systems for Advanced Applications, 185-194 (1997)
[12] S. Thomas, S. Bodagala, K. Alsabti and S. Ranka, An efficient algorithm for the incremental updation of association rules in large databases, 263.; S. Thomas, S. Bodagala, K. Alsabti and S. Ranka, An efficient algorithm for the incremental updation of association rules in large databases, 263.
[13] Ayan, N. F.; Tansel, A. U.; Arkun, M. E., An efficient algorithm to update large itemsets with early pruning, Knowledge Discovery and Data Mining, 287-291 (1999)
[14] Agrawal, R.; Srikant, R., Mining sequential patterns, (Yu, P. S.; Chen, A. L.P., Proc. \(11^{th}\) Int. Conf. Data Engineering, ICDE (1995), IEEE Press), 3-14
[15] Srikant, R.; Agrawal, R., Mining sequential patterns: Generalizations and performance improvements (1995), IBM Almaden Research Center: IBM Almaden Research Center San Jose, CA, Technical Report
[16] Mannila, H.; Toivonen, H.; Verkamo, A. I., Discovering frequent episodes in sequences (1995)
[17] Knuth, D. E., (The Art of Computer Programming, Volume 3 (1968), Addison-Wesley) · Zbl 0191.17903
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.