Abstract
Dictionary learning for sparse coding has been successfully used in different domains, however, has never been employed for the interesting itemset mining. In this paper, we formulate an optimization problem for extracting a sparse representation of itemsets and show that the discrete nature of itemsets makes it NP-hard. An efficient approximation algorithm is presented which greedily solves maximum set cover to reduce overall compression loss. Furthermore, we incorporate our sparse representation algorithm into a layered convolutional model to learn nonredundant dictionary items. Following the intuition of deep learning, our convolutional dictionary learning approach convolves learned dictionary items and discovers statistically dependent patterns using chi-square in a hierarchical fashion; each layer having more abstract and compressed dictionary than the previous. An extensive empirical validation is performed on thirteen datasets, showing better interpretability and semantic coherence of our approach than two existing state-of-the-art methods.
Similar content being viewed by others
References
Aggarwal, C.C., Han, J.: Frequent Pattern Mining. Springer, Berlin (2014)
Agrawal, R., Srikant, R., et al.: Fast Algorithms for Mining Association Rules. Morgan Kaufmann, San Mateo (1994)
Boureau, Y.-l., Cun, Y. L., et al.: Sparse feature learning for deep belief networks. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1185–1192 (2008)
Calders, T., Goethals, B.: Non-derivable itemset mining. Data Min. Knowl. Discov. 14(1), 171–206 (2007)
Chandola, V., Kumar, V.: Summarization—compressing data into an informative representation. Knowl. Inf. Syst. 12(3), 355–378 (2007)
Coenen, F.: The lucs-kdd discretised/normalised arm and carm data library. http://www.csc.liv.ac.uk/frans/KDD/Software/LUCS-KDD-DN/DataSets/dataSets.html
Fowkes, J., Sutton, C.: A bayesian network model for interesting itemsets. In: Proceeding of European Conference Machine Learning and Knowledge Discovery in Databases, pp. 410–425. Springer, Berlin (2016)
Fowkes, J., Sutton, C.: A subsequence interleaving model for sequential pattern mining. In: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining, pp. 835–844. KDD, USA (2016)
Geerts, F., Goethals, B., Mielikäinen, T.: Tiling Databases, pp. 278–289. Springer, Berlin (2004)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11 (1), 10–18 (2009)
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. SIGMOD Rec. 29(2), 1–12 (2000)
Kavukcuoglu, K., Sermanet, P., Boureau, Y.-L., Gregor, K., Mathieu, M., Cun, Y.L.: Learning convolutional feature hierarchies for visual recognition. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1090–1098 (2010)
Lam, H.T., Mörchen, F., Fradkin, D., Calders, T.: Mining compressing sequential patterns. Stat. Anal. Data Min. 7(1), 34–52 (2014)
LeCun, Y., et al: Lenet-5, convolutional neural networks. http://yann.lecun.com/exdb/lenet (2015)
Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient sparse coding algorithms. In: Proceedings of Advances in Neural Information Processing Systems, pp. 801–808 (2006)
Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: Proceedings of International Conference on Machine Learning, pp. 689–696 (2009)
Mampaey, M., Tatti, N., Vreeken, J.: Tell me what i need to know: succinctly summarizing data with itemsets. In: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining, pp. 573–581 (2011)
Mampaey, M., Vreeken, J., Tatti, N.: Summarizing data succinctly with the most informative itemsets. ACM Trans. Knowl. Discov. Data 6(4), 16:1–16:42 (2012)
Mansha, S., Babar, Z., Kamiran, F., Karim, A.: Neural network based association rule mining from uncertain data. In: Proceedings of Neural Information Processing, pp. 129–136. Springer, Berlin (2016)
Mansha, S., Kamiran, F., Karim, A., Anwar, A.: A self-organizing map for identifying influentialcommunities in speech-based networks. In: Proceedings of ACM International on Conference on Information and Knowledge Management, pp. 1965–1968. CIKM (2016)
Mörchen, F., Fradkin, D.: Robust mining of time intervals with semi-interval partial order patterns. In: Proceedings of Society for Industrial and Applied Mathematics (2010)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)
Shang, S., Ding, R., Yuan, B., Xie, K., Zheng, K., Kalnis, P.: User oriented trajectory search for trip recommendation. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 156–167. ACM (2012)
Shang, S., Chen, L., Jensen, C.S., Wen, J.-R., Kalnis, P.: Searching trajectories by regions of interest. IEEE Trans. Knowl. Data Eng. 29(7), 1549–1562 (2017)
Smets, K., Vreeken, J.: Slim: directly mining descriptive patterns. In: Proceedings of SIAM International Conference on Data Mining, pp. 236–247 (2012)
Tatti, N., Vreeken, J.: The long and the short of it: summarising event sequences with serial episodes. In: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining, pp. 462–470. KDD (2012)
Vazirani, V.V.: Approximation Algorithms. Springer, Berlin (2013)
Vreeken, J., Van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)
Wang, W., Yin, H., Sadiq, S., Chen, L., Xie, M., Spore, X. Zhou.: A sequential personalized spatial item recommender system. In: Proceedings of International Conference on Data Engineering, pp. 954–965 (2016)
Webb, G.I.: Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans. Knowl. Data Discov. 4(1), 3:1–3:20 (2010)
Webb, G.I., Vreeken, J.: Efficient discovery of the most interesting associations. ACM Trans. Knowl. Discov. Data 8(3), 15:1–15:31 (2013)
Weisstein, E.W.: Chi-squared test. From MathWorld–A Wolfram Web Resource (1999)
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
Xie, K., Deng, K., Shang, S., Zhou, X., Zheng, K.: Finding alternative shortest paths in spatial networks. ACM Trans. Database Syst. 37(4), 29:1–29:31 (2012)
Xie, Q., Shang, S., Yuan, B., Pang, C., Zhang, X.: Local correlation detection with linearity enhancement in streaming data. In: Proceedings of ACM International on Conference on Information and Knowledge Management, pp. 309–318. CIKM (2013)
Xie, M., Yin, H., Wang, H., Xu, F., Chen, W., Wang, S.: Learning graph-based poi embedding for location-based recommendation. In: Proceedings of ACM International on Conference on Information and Knowledge Management, pp. 15–24 (2016)
Yang, B., Guo, C., Jensen, C.S., Kaul, M., Shang, S.: Stochastic skyline route planning under time-varying uncertainty. In: Proceedings of IEEE International Conference on Data Engineering, pp. 136–147 (2014)
Yin, H., Sun, Y., Cui, B., Hu, Z., Lcars, L. Chen.: A location-content-aware recommender system. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, pp. 221–229. KDD (2013)
Yiu, M.L., Mamoulis, N., Papadias, D.: Aggregate nearest neighbor queries in road networks. IEEE Trans. Knowl. Data Eng. 17(6), 820–833 (2005)
Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Proceedings of European Conference on Computer Vision, pp. 818–833. Springer, Berlin (2014)
Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010)
Zhang, A., Shi, W., Webb, G.I.: Mining significant association rules from uncertain data. Data Min. Knowl. Discov. 30(4), 928–963 (2016)
Zheng, K., Shang, S., Yuan, N.J., Yang, Y.: Towards efficient search for activity trajectories. In: Proceedings of IEEE International Conference on Data Engineering, pp. 230–241 (2013)
Zheng, K., Zheng, Y., Yuan, N.J., Shang, S.: On discovery of gathering patterns from trajectories. In: Proceedings of IEEE International Conference on Data Engineering, pp. 242–253 (2013)
Zheng, K., Su, H., Zheng, B., Shang, S., Xu, J., Liu, J., Zhou, X.: Interactive top-k spatial keyword queries. In: Proceedings of IEEE International Conference on Data Engineering, pp. 423–434 (2015)
Zhu, S., Wang, Y., Shang, S., Zhao, G., Wang, J.: Probabilistic routing using multimodal data. Neurocomputing 253(C), 49–55 (2017)
Author information
Authors and Affiliations
Corresponding author
Additional information
This article belongs to the Topical Collection: Special Issue on Big Data Management and Intelligent Analytics
Guest Editors: Junping Du, Panos Kalnis, Wenling Li, and Shuo Shang
Rights and permissions
About this article
Cite this article
Mansha, S., Lam, H.T., Yin, H. et al. Layered convolutional dictionary learning for sparse coding itemsets. World Wide Web 22, 2225–2239 (2019). https://doi.org/10.1007/s11280-018-0565-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-018-0565-2