×

Mining High Utility Itemsets in massive transactional datasets. (English) Zbl 1265.68044

Summary: Mining high utility itemsets (HUI) from a transaction database means to find itemsets that have utility beyond a user-specified threshold. Existing HUI-mining algorithms suffer from many problems when being applied to massive transactional datasets. One major problem is the high memory requirement: the gigantic data structure built is assumed to fit in the computer main memory. This paper proposes a new disk-based HUI-mining algorithm, which achieves its efficiency by applying three new ideas. First, the transactional data is converted into a new database layout called transactional array that prevents multiple scanning of the database during the mining phase. Second, for each frequent item, a relatively small independent tree is built for summarizing co-occurrences. Finally, a simple and non-recursive mining process reduces the memory requirements as minimum candidacy generation and counting is needed. We have tested our algorithm on several very large transactional databases, and the results show that our algorithm works efficiently.

MSC:

68P10 Searching and sorting
68P20 Information storage and retrieval of data
Full Text: DOI