×

Interactive mining of probabilistic frequent patterns in uncertain databases. (English) Zbl 1504.68055

Summary: Many modern applications such as sensor networks produce probabilistic data. These data are collected into an uncertain database. To interpret uncertainty and to mine frequent patterns in an uncertain database, all possible certain databases are considered, which generates an exponential number of combinations and makes the mining problem highly complicated. In practice, mining is interactive, which makes the discovery of frequent itemsets in an uncertain database even more challenging. The objective of interactive mining is to shorten the time that is required to obtain the desired patterns in the iterated lengthy mining process. The time-consuming mining process in an uncertain database is exacerbated by repeated processing if the mining is performed from scratch. Therefore, we propose an interactive mining algorithm called iDIP to solve this problem. The iDIP algorithm adopts an approximation mechanism to mine the patterns and prunes candidates by using the existing patterns. Comprehensive experiments using both real and synthetic datasets show that iDIP outperforms the well-known re-mining-based MB algorithm for 4.3 times faster on average. In addition, iDIP has good linear scalability.

MSC:

68P15 Database theory
68T05 Learning and adaptive systems in artificial intelligence
68T37 Reasoning under uncertainty in the context of artificial intelligence

Software:

MayBMS
Full Text: DOI

References:

[1] Aggarwal, C. C. and Yu, P. S., A survey of uncertain data algorithms and applications, IEEE Trans. Know. Data Eng.21 (2009) 609-623.
[2] Cheng, R., Kalashnikov, D. V. and Prabhakar, S., Evaluating probabilistic queries over imprecise data, in: Proc. 29th ACM Int. Conf. on Management of Data (2003), pp. 551-562.
[3] Dalvi, N. N. and Suciu, D., Efficient query evaluation on probabilistic databases, Int. J. Very Large Data Bases16 (2007) 523-544.
[4] Huang, J., Antova, L., Koch, C. and Olteanu, D., MayBMS: a probabilistic database management system, in: Proc. 35th ACM Intl. Conf. on Management of Data (2009), pp. 1071-1074.
[5] Leung, C. K.-S., Mateo, M. A. F. and Brajczuk, D. A., A tree-based approach for frequent pattern mining from uncertain data, in: Proc. 12th Pacific-Asia Conf. Knowledge Discovery and Data Mining (2008), pp. 653-661.
[6] Khoussainova, N., Balazinska, M. and Suciu, D., Towards correcting input data errors probabilistically using integrity constraints, in: Proc. 5th ACM Intl. Workshop on Data Engineering for Wireless and Mobile Access (2006), pp. 43-50.
[7] Aggarwal, C. C., Li, Y., Wang, J. and Wang, J., Frequent pattern mining with uncertain data, in: Proc. 15th ACM Int. Conf. on Knowledge Discovery and Data Mining (2009), pp. 29-38.
[8] Chui, C.-K., Kao, B. and Hung, E., Mining frequent itemsets from uncertain data, in: Proc. 11th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (2007), pp. 281-292.
[9] Chui, C.-K. and Kao, B., A decremental approach for mining frequent itemsets from uncertain data, in: Proc. 12th Pacific-Asia Conf. on Knowledge Discovery and Data Mining (2008), pp. 64-75.
[10] Ahmed, A. U., Ahmed, C. F., Samiullah, M., Adnan, N. and Leung, Carson K.-S., Mining interesting patterns from uncertain databases, Inform. Sci.354 (2016) 60-85. · Zbl 1398.68427
[11] Bernecker, T., Kriegel, H.-P., Renz, M., Verhein, F. and Züfle, A., Probabilistic frequent itemset mining in uncertain databases, in: Proc. 15th ACM Int. Conf. on Knowledge Discovery and Data Mining (2009), pp. 119-128.
[12] Liu, C., Chen, L. and Zhang, C., Mining probabilistic representative frequent patterns from uncertain data, in: Proc. 2013 SIAM Int. Conf. on Data Mining (2013), pp. 73-81.
[13] Li, H., Zhang, Y. and Zhang, N., Discovering top-k probabilistic frequent itemsets from uncertain databases, in: Proc. 5th Int. Conf. on Information Technology and Quantitative Management (2017), pp. 1124-1132.
[14] Lin, M.-Y., Fu, C.-T. and Hsueh, S.-C., Incremental update on probabilistic frequent itemsets in uncertain databases, in: Proc. ACM 6th Int. Conf. on Ubiquitous Information Management and Communication (2012), pp. 1-8.
[15] Tang, P. and Peterson, E. A., Mining probabilistic frequent closed itemsets in uncertain databases, in: Proc. 49th ACM Annual Southeast Regional Conf. (2011), pp. 86-91.
[16] Calders, T., Garboni, C. and Goethals, B., Approximation of frequentness probability of itemsets in uncertain data, in: Proc. IEEE Int. Conf. on Data Mining (2010), pp. 749-754.
[17] Lin, M.-Y. and Lee, S.-Y., Interactive sequence discovery by incremental Mining, Inform. Sci.165 (2004) 187-205. · Zbl 1073.68716
[18] Parthasarathy, S., Zaki, M. J., Ogihara, M., and Dwarkadas, S., Incremental and interactive sequence mining, in: Proc. 8th ACM Int. Conf. on Information and Knowledge Management (1999), pp. 2-6.
[19] Jampani, R., Xu, F., Wu, M., Perez, L. L., Jermaine, C. M. and Haas, P. J., MCDB: A Monte Carlo approach to managing uncertain data, in: Proc. 34th ACM Intl. Conf. on Management of Data, (2008), pp. 687-700.
[20] A. P. Sistla, O. Wolfson, S. Chamberlain and S. Dao, Querying the uncertain position of moving objects, Temporal Database: Research and Practice (1998), pp. 310-337.
[21] Chen, G. and Wei, Q., Fuzzy association rules and the extended mining algorithms, Inform. Sci.147 (2002) 201-228. · Zbl 1033.68043
[22] Sun, L., Cheng, R., Cheung, D. W. and Cheng, J., Mining uncertain data with probabilistic guarantees, in: Proc. 16th ACM Int. Conf. on Knowledge Discovery and Data Mining (2010), pp. 273-282.
[23] Wang, L., Cheng, R., Lee, S. D. and Cheung, D. W.-L., Accelerating probabilistic frequent itemsets mining: a model-based approach, in: Proc. 19th ACM Int. Conf. on Information and Knowledge Management (2010), pp. 429-438.
[24] Sun, X., Liu, L. and Wang, S., An approximation algorithm of mining frequent itemsets from uncertain dataset, Int. J. Advancements in Computing Technology4 (2012) 42-49.
[25] Cam, L. L., An approximation theorem for the Poisson binomial distribution, Pacific J. Math.10 (1960) 1181-1197. · Zbl 0118.33601
[26] Zheng, Z., Kohavi, R. and Mason, L., Real world performance of association rule algorithms, in: Proc. 7th ACM Int. Conf. on Knowledge Discovery and Data Mining (2001), pp. 401-406.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.