×

e-NSP: efficient negative sequential pattern mining. (English) Zbl 1352.68219

Summary: As an important tool for behavior informatics, negative sequential patterns (NSP) (such as missing medical treatments) are critical and sometimes much more informative than positive sequential patterns (PSP) (e.g. using a medical service) in many intelligent systems and applications such as intelligent transport systems, healthcare and risk management, as they often involve non-occurring but interesting behaviors. However, discovering NSP is much more difficult than identifying PSP due to the significant problem complexity caused by non-occurring elements, high computational cost and huge search space in calculating negative sequential candidates (NSC). So far, the problem has not been formalized well, and very few approaches have been proposed to mine for specific types of NSP, which rely on database re-scans after identifying PSP in order to calculate the NSC supports. This has been shown to be very inefficient or even impractical, since the NSC search space is usually huge. This paper proposes a very innovative and efficient theoretical framework: set theory-based NSP mining (ST-NSP), and a corresponding algorithm, e-NSP, to efficiently identify NSP by involving only the identified PSP, without re-scanning the database. Accordingly, negative containment is first defined to determine whether a data sequence contains a negative sequence based on set theory. Second, an efficient approach is proposed to convert the negative containment problem to a positive containment problem. The NSC supports are then calculated based only on the corresponding PSP. This not only avoids the need for additional database scans, but also enables the use of existing PSP mining algorithms to mine for NSP. Finally, a simple but efficient strategy is proposed to generate NSC. Theoretical analyses show that e-NSP performs particularly well on datasets with a small number of elements in a sequence, a large number of itemsets and low minimum support. e-NSP is compared with two currently available NSP mining algorithms via intensive experiments on three synthetic and six real-life datasets from aspects including data characteristics, computational costs and scalability. e-NSP is tens to thousands of times faster than baseline approaches, and offers a sound and effective approach for efficient mining of NSP in large scale datasets by directly using existing PSP mining algorithms.

MSC:

68T20 Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)

References:

[2] Agrawal, R.; Srikant, R., Mining sequential patterns, (ICDE’95 (1995)), 3-14
[3] Algarni, A.; Zhong, N., Mining positive and negative patterns for relevance feature discovery, (KDD’2010 (2010)), 753-762
[4] Ayres, J.; Flannick, J.; Gehrke, J.; Yiu, T., Sequential pattern mining using a bitmap representation, (KDD’02 (2002)), 429-435
[5] Bilal, A.; Erhan, A., An efficient genetic algorithm for automated mining of both positive and negative quantitative association rules, Soft Comput., 10, 3, 230-237 (2005)
[6] Cao, L., In-depth behavior understanding and use, Inf. Sci., 180, 17, 3067-3085 (2010)
[7] (Cao, L.; Yu, P., Behavior Computing: Modeling, Analysis, Mining and Decision (2012), Springer)
[8] Cao, L.; Yu, P. S.; Kumar, V., Nonoccurring behavior analytics: a new area, IEEE Intell. Syst., 30, 6, 4-11 (2015)
[9] Cao, L.; Ou, Y.; Yu, P. S., Coupled behavior analysis with applications, IEEE Trans. Knowl. Data Eng., 24, 8, 1378-1392 (2012)
[10] Cao, L.; Yu, P.; Zhao, Y.; Zhang, C., Domain Driven Data Mining (2010), Springer · Zbl 1194.68187
[11] Hsueh, S. C.; Lin, M.-Y.; Chen, C.-L., Mining negative sequential patterns for e-commerce recommendations, (APSCC’08 (2008), IEEE), 1213-1218
[12] Dong, X.; Zheng, Z.; Cao, L.; Zhao, Y.; Zhang, C.; Li, J.; Wei, W.; Ou, Y., e-NSP: efficient negative sequential pattern mining based on identified positive patterns without database rescanning, (CIKM’2011 (2011), ACM), 825-830
[13] Dong, X.; Sun, F.; Han, X.; Hou, R., Study of positive and negative association rules based on multi-confidence and chi-squared test, (ADMA06. ADMA06, LNCS, vol. 4093 (2006), Springer-Verlag: Springer-Verlag Berlin-Heidelberg), 100-109
[14] Dong, X.; Gong, Y.; Zhao, L., Comparisons of typical algorithms in negative sequential pattern mining, (2014 IEEE Workshop on Electronics, Computer and Applications (2014)), 387-390
[15] Han, J.; Pei, J.; Mortazavi-Asl, B.; Chen, Q.; Dayal, U.; Hsu, M. C., Freespan: frequent pattern-projected sequential pattern mining, (KDD’00 (2000)), 355-359
[16] Gong, Y.; Liu, C.; Dong, X., Research on typical algorithms in negative sequential pattern mining, Open Automat. Control Syst. J., 7, 934-941 (2015)
[17] Kamepalli, S.; Kurra, R., Frequent negative sequential patterns: a survey, Int. J. Comput. Eng. Technol., 5, 3, 15-121 (2014)
[18] Kazienko, P., Mining sequential patterns with negative conclusions, (DaWaK’2008 (2008)), 423-432
[19] Kazienko, P., Filtering of web recommendation lists using positive and negative usage patterns, (KES’2007 (2007), Springer), 1016-1023
[20] Khare, V. K.; Rastogi, V., Mining positive and negative sequential pattern in incremental transaction databases, Int. J. Comput. Appl., 71, 1, 18-22 (2013)
[21] Lin, N. P.; Chen, H. J.; Hao, W. H., Mining negative sequential patterns, (WSEAS’2007 (2007)), 654-658
[22] Lin, N. P.; Chen, H. J.; Hao, W. H., Mining negative fuzzy sequential patterns, (Proceedings of the 7th WSEAS International Conference on Simulation, Modeling and Optimization (2007)), 52-57
[23] Lin, N. P.; Chen, H. J.; Hao, W. H., Mining strong positive and negative sequential patterns, WSEAS Trans. Comput., 3, 7, 119-124 (2008)
[24] Lin, N. P.; Hao, W. H.; Chen, H. J.; Chang, C. I.; Chueh, H. E., An algorithm for mining strong negative fuzzy sequential patterns, Int. J. Comput., 3, 1, 167-172 (2007)
[25] Mesbah, S.; Taghiyareh, F., A new sequential classification to assist Ad auction agent in making decisions, (5th International Symposium on Telecommunications (2010)), 1006-1012
[26] Ouyang, W. M.; Huang, Q. H., Mining negative sequential patterns in transaction databases, (ICMLC’2007 (2007)), 830-834
[27] Ouyang, W. M.; Huang, Q. H.; Luo, S., Mining positive and negative fuzzy sequential patterns in large transaction databases, (FSKD’2008 (2008)), 18-23
[28] Ouyang, W. M.; Huang, Q. H., Mining positive and negative fuzzy multiple level sequential patterns in large transaction databases, (GCIS’2009 (2009)), 500-504
[29] Ouyang, W. M.; Huang, Q. H., Mining positive and negative sequential patterns with multiple minimum supports in large transaction databases, (GCIS’2010 (2010)), 190-193
[30] Pei, J.; Han, J.; Mortazavi-Asl, B.; Pinto, H.; Chen, Q.; Dayal, U.; Hsu, M. C., Prefixspan: mining sequential patterns efficiently by prefix-projected pattern growth, (ICDE’2001 (2001)), 215-224
[31] Rastogi, V.; Khare, V. K., Apriori based mining positive and negative frequent sequential patterns, Int. J. Latest Trends Eng. Technol., 1, 3, 24-33 (2012)
[32] Srikant, R.; Agrawal, R., Mining sequential patterns: generalizations and performance improvements, (EDBT’1996, vol. 1057 (1996)), 1-17
[33] Wu, X.; Zhang, C.; Zhang, S., Efficient mining of both positive and negative association rules, ACM Trans. Inf. Syst., 22, 381-405 (2004)
[34] Zaki, M. J., Spade: an efficient algorithm for mining frequent sequences, Mach. Learn., 42, 31-60 (2001) · Zbl 0970.68052
[35] Zhao, Y.; Zhang, H.; Cao, L.; Zhang, C.; Bohlscheid, H., Efficient mining of event-oriented negative sequential rules, (WI-IAT’2008, vol. 1 (2008), IEEE/WIC/ACM), 336-342
[36] Zhao, Y.; Zhang, H.; Cao, L.; Zhang, C.; Bohlscheid, H., Mining both positive and negative impact-oriented sequential rules from transactional data, (PAKDD’09. PAKDD’09, LNCS, vol. 5476 (2009)), 656-663
[37] Zhao, Y.; Zhang, H. F.; Wu, S. S.; Pei, J.; Cao, L. B.; Zhang, C. Q.; Bohlscheid, H., Debt detection in social security by sequence classification using both positive and negative patterns, (ECML/PKDD’2009. ECML/PKDD’2009, LNAI, vol. 5782, Part 2 (2009)), 648-663
[38] Zheng, Z.; Zhao, Y.; Zuo, Z.; Cao, L., Negative-GSP: an efficient method for mining negative sequential patterns, (Data Mining and Analytics (AusDM’09), vol. 101 (2009)), 63-67
[39] Zheng, Z.; Zhao, Y.; Zuo, Z.; Cao, L., An efficient GA-based algorithm for mining negative sequential patterns, (PAKDD’10. PAKDD’10, LNCS, vol. 6118 (2010)), 262-273
[40] Liu, C.; Dong, X.; Li, C.; Li, Y., SAPNSP select actionable positive and negative sequential patterns based on a contribution metric, (International Conference on Fuzzy Systems and Knowledge Discovery (FSKD2015) (2015)), 847-851
[41] Pisharath, J.; Liu, Y.; Ozisikyilmaz, B.; Narayanan, R.; Liao, W. K.; Choudhary, A.; Memik, G., NU-MineBench Version 2.0 Data Set and Technical Report
[42] Fournier-Viger, P.; Gomariz, A.; Gueniche, T.; Soltani, A.; Wu, C.; Tseng, V. S., SPMF: a Java open-source pattern mining library, J. Mach. Learn. Res., 15, 3389-3393 (2014) · Zbl 1310.68178
[43] Zheng, Z., Negative sequential pattern mining (2011), PhD Thesis
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.