×

Grafting for combinatorial binary model using frequent itemset mining. (English) Zbl 1458.68172

Summary: We consider the class of linear predictors over all logical conjunctions of binary attributes, which we refer to as the class of combinatorial binary models (CBMs) in this paper. CBMs are of high knowledge interpretability but naïve learning of them from labeled data requires exponentially high computational cost with respect to the length of the conjunctions. On the other hand, in the case of large-scale datasets, long conjunctions are effective for learning predictors. To overcome this computational difficulty, we propose an algorithm, GRAfting for Binary datasets (GRAB), which efficiently learns CBMs within the \(L_1\)-regularized loss minimization framework. The key idea of GRAB is to adopt weighted frequent itemset mining for the most time-consuming step in the grafting algorithm, which is designed to solve large-scale \(L_1\)-RERM problems by an iterative approach. Furthermore, we experimentally showed that linear predictors of CBMs are effective in terms of prediction accuracy and knowledge discovery.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

References:

[1] Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of the 20th international conference on very large databases, pp 487-499
[2] Aizenstein, H.; Pitt, L., On the learnability of disjunctive normal form formulas, Mach Learn, 19, 3, 183-208 (1995) · Zbl 0831.68094
[3] Andrew, V.; Uzilov, Jmk; Mathews, Dh, Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change, BMC Bioinform, 7, 1, 173 (2006) · doi:10.1186/1471-2105-7-173
[4] Baldi, P.; Sadowski, P.; Whiteson, D., Searching for exotic particles in high-energy physics with deep learning, Nat Commun, 5, 4308 (2014) · doi:10.1038/ncomms5308
[5] Bayardo RJ Jr (1998) Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM SIGMOD international conference on management of data, pp 85-93
[6] Bishop, Cm, Pattern recognition and machine learning (information science and statistics) (2006), Secaucus: Springer-Verlag New York Inc., Secaucus · Zbl 1107.68072
[7] Breiman, L.; Friedman, J.; Stone, Cj; Olshen, Ra, Classification and regression trees (1984), Boca Raton: CRC Press, Boca Raton · Zbl 0541.62042
[8] Bshouty, Nh, Exact learning boolean functions via the monotone theory, Inf Comput, 123, 1, 146-153 (1995) · Zbl 1096.68634 · doi:10.1006/inco.1995.1164
[9] Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, ACM, New York, NY, USA, KDD ’16, pp 785-794. doi:10.1145/2939672.2939785
[10] Cheng H, Yan X, Han J, Hsu CW (2007) Discriminative frequent pattern analysis for effective classification. In: Proceedings of 2007 IEEE 23rd international conference on data engineering. IEEE, pp 716-725
[11] Collobert, R.; Bengio, S.; Bengio, Y., A parallel mixture of SVMs for very large scale problems, Neural Comput, 14, 5, 1105-1114 (2002) · Zbl 1003.68135 · doi:10.1162/089976602753633402
[12] Dantzig, Gb; Wolfe, P., Decomposition principle for linear programs, Oper Res, 8, 1, 101-111 (1960) · Zbl 0093.32806 · doi:10.1287/opre.8.1.101
[13] Desaulniers, G.; Desrosiers, J.; Solomon, Mm, Column generation (2006), Berlin: Springer, Berlin
[14] Deshpande, M.; Kuramochi, M.; Wale, N.; Karypis, G., Frequent substructure-based approaches for classifying chemical compounds, IEEE Trans Knowl Data Eng, 17, 8, 1036-1050 (2005) · doi:10.1109/TKDE.2005.127
[15] Fan, Re; Chang, Kw; Hsieh, Cj; Wang, Xr; Lin, Cj, LIBLINEAR: a library for large linear classification, J Mach Learn Res, 9, Aug, 1871-1874 (2008) · Zbl 1225.68175
[16] Guyon, I.; Gunn, S.; Ben-Hur, A.; Dror, G., Result analysis of the NIPS 2003 feature selection challenge, Adv Neural Inf Process Syst, 17, 545-552 (2005)
[17] Ho TK (1995) Random decision forests. In: Proceedings of the third international conference on document analysis and recognition, vol 1. IEEE, pp 278-282
[18] Ho, Tk, The random subspace method for constructing decision forests, IEEE Trans Pattern Anal Mach Intell, 20, 8, 832-844 (1998) · doi:10.1109/34.709601
[19] Ho TK, Kleinberg EM (1996) Building projectable classifiers of arbitrary complexity. In: Proceedings of the 13th international conference on pattern recognition, vol 2. IEEE, pp 880-885
[20] Kudo, T.; Maeda, E.; Matsumoto, Y., An application of boosting to graph classification, Adv Neural Inf Process Syst, 17, 729-736 (2004)
[21] Lichman M (2013) UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 30 Aug 2019
[22] Lundberg, Sm; Lee, Si, A unified approach to interpreting model predictions, Adv Neural Inf Process Syst, 30, 4765-4774 (2017)
[23] Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; Vanderplas, J.; Passos, A.; Cournapeau, D.; Brucher, M.; Perrot, M.; Duchesnay, E., Scikit-learn: machine learning in Python, J Mach Learn Res, 12, 2825-2830 (2011) · Zbl 1280.68189
[24] Perkins, S.; Lacker, K.; Theiler, J., Grafting: fast, incremental feature selection by gradient descent in function space, J Mach Learn Res, 3, 1333-1356 (2003) · Zbl 1102.68578
[25] Platt JC (1999) Advances in kernel methods. MIT Press, Cambridge, MA, USA. Chapter fast training of support vector machines using sequential minimal optimization, pp 185-208
[26] Prokhorov D (2001) IJCNN 2001 neural network competition. In: Slide presentation in international joint conference on neural networks 2001. http://www.geocities.ws/ijcnn/nnc_ijcnn01.pdf. Accessed 30 Aug 2019
[27] Quinlan, Jr, C4.5: programs for machine learning (1993), Burlington: Morgan Kaufmann Publishers, Burlington
[28] Ribeiro MT, Singh S, Guestrin C (2016) Why should I trust you? Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pp 1135-1144
[29] Rish, I.; Grabarnik, G., Sparse modeling: theory, algorithms, and applications (2014), Boca Raton: CRC Press Inc., Boca Raton
[30] Saigo, H.; Uno, T.; Tsuda, K., Mining complex genotypic features for predicting HIV-1 drug resistance, Bioinformatics, 23, 18, 2455-2462 (2007) · doi:10.1093/bioinformatics/btm353
[31] Schapire, Re; Freund, Y., Boosting: foundations and algorithms (2012), Cambridge: The MIT Press, Cambridge · Zbl 1278.68021
[32] Shawe-Taylor, J.; Cristianini, N., Kernel methods for pattern analysis (2004), New York: Cambridge University Press, New York
[33] Tsuda K, Kudo T (2006) Clustering graphs by weighted substructure mining. In: Proceedings of the 23rd international conference on Machine learning, pp 953-960
[34] Uno T, Asai T, Uchida Y, Arimura H (2003) LCM: an efficient algorithm for enumerating frequent closed item sets. In: Proceedings of the third IEEE international conference on data mining workshop on frequent itemset mining implementations, available as CEUR workshop proceedings, vol 90. http://ceur-ws.org/Vol-90/. Accessed 30 Aug 2019
[35] Uno T, Kiyomi M, Arimura H (2004) LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: Proceedings of the fourth IEEE international conference on data mining workshop on frequent itemset mining implementations, available as CEUR workshop proceedings, vol 126. http://ceur-ws.org/Vol-126/. Accessed 30 Aug 2019
[36] Uno T, Kiyomi M, Arimura H (2005) LCM ver. 3: collaboration of array, bitmap and prefix tree for frequent itemset mining. In: Proceedings of the first international workshop on open source data mining: frequent pattern mining implementations, pp 77-86
[37] Zaki MJ, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Proceedings of the third international conference on knowledge discovery and data mining, pp 283-286
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.