×

Efficient algorithms to identify periodic patterns in multiple sequences. (English) Zbl 1448.68480

Summary: Periodic pattern mining is a popular data mining task, which consists of identifying patterns that periodically appear in data. Traditional periodic pattern mining algorithms are designed to find patterns in a single sequence. However, in several domains, it is desirable to discover patterns that are periodic in many sequences. An example of such application is market basket analysis. Given a database of sequences of transactions made by customers, discovering sets of items that are periodically bought by customers can help understand customer behavior. To discover periodic patterns common to multiple sequences, this paper extends the traditional problem of mining periodic patterns in a sequence. Two novel measures are defined called the standard deviation of periods and the sequence periodic ratio. Two algorithms are proposed to mine these patterns efficiently called MPFPS\(_{\text{BFS}}\) and MPFPS\(_{\text{DFS}}\), which perform a breadth-first search and depth-first search, respectively. Because the sequence periodic ratio is neither monotone nor anti-monotone, these algorithms rely on a novel upper-bound called boundRa and two novel search space pruning properties to find periodic patterns efficiently. The algorithms have been evaluated on multiple datasets. Results show that they are efficient and can filter numerous non periodic itemsets to identify periodic patterns.

MSC:

68W32 Algorithms on strings
68T10 Pattern recognition, speech recognition

Software:

ERMiner; PrefixSpan; SPMF; LCM
Full Text: DOI

References:

[1] Agrawal, R.; Srikant, R., Fast algorithms for mining association rules, Proceedings of the Twentieth International Conference on Very Large Data Bases, 487-499 (1994), Morgan Kaufmann: Morgan Kaufmann Santiago de Chile
[2] Agrawal, R.; Srikant, R., Mining sequential patterns, Proceedings of the Eleventh International Conference on Data Engineering, 3-14 (1995), IEEE Press: IEEE Press Taipei
[3] Amphawan, K.; Surarerks, A.; Lenca, P., Mining periodic-frequent itemsets with approximate periodicity using interval transaction-ids list tree, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, 245-248 (2010)
[4] Amphawan, K.; Lenca, P.; Surarerks, A., Mining top-k periodic-frequent pattern from transactional databases without support threshold, Proceedings of the First International Conference on Advances in Information Technology, 18-29 (2009), Springer: Springer Bangkok
[5] Ayres, J.; Flannick, J.; Gehrke, J.; Yiu, T., Sequential pattern mining using a bitmap representation, Proceedings of the Eighth ACM SIGKDD International Conference Knowledge Discovery and Data Mining, 429-435 (2012), ACM: ACM Edmonton
[6] Berlingerio, M.; Pinelli, F.; Calabrese, F., ABACUS: frequent pattern mining-based community discovery in multidimensional networks, J. Data Min. Knowl. Discov., 26, 294-320 (2013) · Zbl 1281.68173
[7] Cao, H.; Cheung, D. W.; Mamoulis, N., Discovering partial periodic patterns in discrete data sequences, Proceedings of the Eighth Pacific-Asia Conference on Knowledge Discovery and Data Mining, 653-658 (2004), Springer: Springer Sydney
[8] Dinh, T.; Huynh, V. N.; Lee, B., Mining periodic high utility sequential patterns, Proceedings of the Ninth Asian Conference on Intelligent Information and Database Systems, 545-555 (2017), Springer: Springer Kanazawa
[9] Duan, Y.; Fu, X.; Luo, B.; Wang, Z.; Shi, J.; Du, X., Detective: automatically identify and analyze malware processes in forensic scenarios via DLLs, Proceedings of the International Conference on Communications, 5691-5696 (2015), IEEE: IEEE London
[10] Fan, Y.; Ye, Y.; Chen, L., Malicious sequential pattern mining for automatic malware detection, J. Expert Syst. Appl., 52, 16-25 (2016)
[11] Fernando, B.; Elisa, F.; Tinne, T., Effective use of frequent itemset mining for image classification, Proceedings of the Twelfth European Conference on Computer Vision, 214227 (2012), Springer: Springer Florence
[12] Fidino, M.; Magle, S. B., Using fourier series to estimate periodic patterns in dynamic occupancy models, Ecosphere, 8, 9 (2017)
[13] Fournier-Viger, P.; Gomariz, A.; Campos, M.; Thomas, R., Fast vertical mining of sequential patterns using co-occurrence information, Proceedings of the Eighteenth Pacific-Asia Conference on Knowledge Discovery and Data Mining, 40-52 (2014), Springer: Springer Tainan
[14] Fournier-Viger, P.; Gomariz, A.; Gueniche, T.; Soltani, A.; Wu., C.; Tseng, V. S., SPMF: a java open-source pattern mining library, J. Mach. Learn. Res., 15, 1, 3389-3393 (2014) · Zbl 1310.68178
[15] Fournier-Viger, P.; Gueniche, T.; Zida, S.; Tseng, V. S., ERMiner: sequential rule mining using equivalence classes, Proceedings of the Thirteenth International Symposium on Intelligent Data Analysis, 108-119 (2014), Springer: Springer Leuven
[16] Fournier-Viger, P.; Li, Z.; Lin, J. C.W.; Kiran, R. U.; Fujita, H., Discovering periodic patterns common to multiple sequences, Proceedings of the Twentieth International Conference on Big Data Analytics and Knowledge Discovery, 231-246 (2018)
[17] Fournier-Viger, P.; Lin, C.-W.; Duong, Q.-H.; Dam, T.-L., PHM: mining periodic high-utility itemsets, Proceedings of the Sixteenth Industrial Conference on Data Mining, 64-79 (2016), Springer: Springer New York
[18] Fournier-Viger, P.; Lin, C.-W.; Duong, Q.-H.; Dam, T.-L.; Sevcic, L.; Uhrin, D.; Voznak, M., PFPM: discovering periodic frequent patterns with novel periodicity measures, Proceedings of the Second Czech-China Scientific Conference, 1-13 (2017)
[19] Fournier-Viger, P.; Lin, J. C.-W.; Kiran, R. U.; Koh, Y. S.; Thomas, R., A survey of sequential pattern mining, J. Data Sci. Pattern Recognit., 1, 1, 54-77 (2017)
[20] Fournier-Viger, P.; Lin, J. C.W.; Vo, B.; Chi, T. T.; Zhang, J.; Le, H. B., A survey of itemset mining, J. WIREs Data Min. Knowl. Discov., 7, 4, 18 (2017)
[21] Fournier-Viger, P.; Wu, C. W.; Tseng, V. S.; Cao, L.; Nkambou, R., Mining partially-ordered sequential rules common to multiple sequences, IEEE Trans. Knowl. Data Eng., 27, 8, 2203-2216 (2015)
[22] Fournier-Viger, P.; Nkambou, R.; Mephu Nguifo, E., A knowledge discovery framework for learning task models from user interactions in intelligent tutoring systems, Proceedings of the Seventh Mexican International Conference on Artificial Intelligence, 765-778 (2008), Springer: Springer Atizapn de Zaragoza
[23] Gouda, K.; Hassaan, M.; Zaki, M. J., Prism: an effective approach for frequent sequence mining via prime-block encoding, J. Comput. Syst. Sci., 76, 1, 88-102 (2010) · Zbl 1182.68172
[24] Halder, S.; Samiullah, M.; Lee, Y.-K., Supergraph based periodic pattern mining in dynamic social networks, J. Expert Syst. Appl., 72, 430-442 (2017)
[25] Han, J.; Pei, J.; Mortazavi-Asl, B.; Chen, Q.; Dayal, U.; Hsu, M., FreeSpan: frequent pattern-projected sequential pattern mining, Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 355-359 (2000), ACM: ACM NewYork
[26] Han, J.; Pei, J.; Yin, Y., Mining frequent patterns without candidate generation, Proceedings of the ACM SIGMOD International Conference on Management of data, 1-12 (2000), ACM: ACM Dallas
[27] Han, J.; Pei, J.; Mortazavi-Asl, B.; Chen, Q.; Dayal, U.; Hsu, M., PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth, Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 355-359 (2000), ACM: ACM Boston
[28] Harms, S. K.; Deogun, J.; Tadesse, T., Discovering sequential association rules with constraints and time lags in multiple sequences, Proceedings of the Thirteenth International Symposium Methodologies on Intelligent Systems, 373-376 (2002), Springer: Springer North-Holland · Zbl 1049.68629
[29] Kiran, R. U.; Kitsuregawa, M.; Reddy, P. K., Efficient discovery of periodic-frequent patterns in very large databases, J. Syst. Softw., 112, 110-121 (2016)
[30] Kiran, R. U.; Reddy, P. K., Mining rare periodic-frequent patterns using multiple minimum supports, Proceedings of the Fifteenth International Conference on Management of Data, 7-8 (2009), IEEE: IEEE Mysore
[31] Kiran, R. U.; Reddy, P. K., An alternative interestingness measure for mining periodic-frequent patterns, Proceedings of the Sixteenth International Conference on Database Systems for Advanced Applications, 183-192 (2011), HongKong: HongKong HongKong
[32] Li, D.; Deogun, J. S., Discovering partial periodic sequential association rules with time lag in multiple sequences for prediction, Proceedings of the Eleventh International Symposium on Methodologies for Intelligent Systems, 332-341 (2005), Springer: Springer New York · Zbl 1132.68570
[33] Lin, J. C.-W.; Gan, W.; Fournier-Viger, P.; Chao, H.-C.; T., W. J.; Zhan, J., Extracting recent weighted-based patterns from uncertain temporal databases, Eng. Appl. Artif. Intell., 61, 161-172 (2017)
[34] Liu, Y.; Zhao, Y.; Chen, L.; Pei, J.; Han, J., Mining frequent trajectory patterns for activity monitoring using radio frequency tag arrays, J. Trans. Parallel Distrib. Syst., 23, 11, 2138-2149 (2012)
[35] Lo, D.; Ramalingam, G.; Ranganath, V. P.; Vaswani, K., Mining quantified temporal rules: formalism, algorithms, and evaluation, Proceedings of the Sixteenth Working Conference on Reverse Engineering, 62-71 (2009), IEEE: IEEE Lille
[36] Mannila, H.; Toivonen, H.; Verkamo, A. I., Discovery of frequent episodes in event sequences, J. Data Min. Knowl. Discov., 1, 3, 259-289 (1997)
[37] Mwamikazi, E.; Fournier-Viger, P.; Moghrabi, C.; Baudouin, R., A dynamic questionnaire to further reduce questions in learning style assessment, Proceedings of the Tenth International Conference on Artificial Intelligence Applications and Innovations, 224-235 (2014), Springer: Springer Rhodes
[38] Nishi, M. A.; Ahmed, C. F.; Samiullah, M.; Jeong, B., Effective periodic pattern mining in time series databases, J. Expert Syst. Appl., 40, 3015-3027 (2013)
[39] Pokou, J. M.; Fournier-Viger, P.; Moghrabi, C., Authorship attribution using small sets of frequent part-of-speech skip-grams, Proceedings of the Twenty Ninth International Florida Artificial Intelligence Research Society Conference, 86-91 (2016), AAAI: AAAI Key Largo
[40] Schweizer, D.; Zehnder, M.; Wache, H.; Witschel, H. F.; Zanatta, D.; Rodriguez, M., Using consumer behavior data to reduce energy consumption in smart homes: applying machine learning to save energy without lowering comfort of inhabitants, Proceedings of the Fourteenth International Conference on Machine Learning and Applications, 1123-1129 (2015), IEEE: IEEE Miami
[41] Srikant, R.; Agrawal, R., Mining sequential patterns: generalizations and performance improvements, Proceedings of the EDBT Advances in Database Technology, 1-17 (1996)
[42] Surana, A.; Kiran, R. U.; Reddy, P. K., An efficient approach to mine periodic-frequent patterns in transactional databases, Proceedings of the Sixteenth Pacific-Asia Conference on Knowledge Discovery and Data Mining, 254-266 (2011), Springer: Springer Kuala Lumpur
[43] Syed, T.; Chowdhury, A.; Byeong, J.; Young, C., Discovering periodic-frequent patterns in transactional databases, Proceedings of the Thirteenth Pacific-Asia Conference on Knowledge Discovery and Data Mining, 242-253 (2009), Springer: Springer Bangkok
[44] Uno, T.; Kiyomi, M.; Arimura, H., LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets, Proceedings of the Fourth International Conference on Data Mining Workshop on Frequent Itemset Mining Implementations, IEEE, CEUR (2004)
[45] Wang, J.; Han, J.; Li, C., Frequent closed sequence mining without candidate maintenance, J. IEEE Trans. Knowl. Data Eng., 19, 8, 1042-1056 (2007)
[46] Zaki, M. J.; Gouda, K., Fast vertical mining using diffsets, Proceedings of the Ninth ACM SIGKDD Intern. Conf. on Knowledge Discovery and Data Mining, 326-335 (2003), ACM: ACM Washington
[47] Zhang, C.; Liu, C.; Zhang, X.; Almpanidis, G., An up-to-date comparison of state-of-the-art classification algorithms, Expert Syst. Appl., 82, 128-150 (2017)
[48] Ziebarth, S.; Chounta, I. A.; Hoppe, H. U., Resource access patterns in exam preparation activities, Proceedings of the Fifth European Conference on Technology Enhanced Learning, 497-502 (2015), Springer: Springer Toledo
[49] Zimmermann, A., Understanding episode mining techniques: benchmarking on diverse, realistic, artificial data, J. Intell. Data Anal., 18, 5, 761-791 (2014)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.