×

kLog: a language for logical and relational learning with kernels. (English) Zbl 1405.68288

Summary: We introduce kLog, a novel approach to statistical relational learning. Unlike standard approaches, kLog does not represent a probability distribution directly. It is rather a language to perform kernel-based learning on expressive logical and relational representations. kLog allows users to specify learning problems declaratively. It builds on simple but powerful concepts: learning from interpretations, entity/relationship data modeling, logic programming, and deductive databases. Access by the kernel to the rich representation is mediated by a technique we call graphicalization: the relational representation is first transformed into a graph – in particular, a grounded entity/relationship diagram. Subsequently, a choice of graph kernel defines the feature space. kLog supports mixed numerical and symbolic data, as well as background knowledge in the form of Prolog or Datalog programs as in inductive logic programming systems. The kLog framework can be applied to tackle the same range of tasks that has made statistical relational learning so popular, including classification, regression, multitask learning, and collective classification. We also report about empirical comparisons, showing that kLog can be either more accurate, or much faster at the same level of accuracy, than Tilde and Alchemy. kLog is GPLv3 licensed and is available at http://klog.dinfo.unifi.it along with tutorials.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
68N17 Logic programming
68P15 Database theory

References:

[1] Dietterich, T.; Domingos, P.; Getoor, L.; Muggleton, S.; Tadepalli, P., Structured machine learning: the next ten years, Mach. Learn., 73, 3-23 (2008) · Zbl 1470.68098
[3] (De Raedt, L.; Frasconi, P.; Kersting, K.; Muggleton, S., Probabilistic Inductive Logic Programming: Theory and Applications. Probabilistic Inductive Logic Programming: Theory and Applications, Lecture Notes in Computer Science, vol. 4911 (2008), Springer: Springer Berlin) · Zbl 1132.68007
[4] (Getoor, L.; Taskar, B., Introduction to Statistical Relational Learning (2007), MIT Press: MIT Press Cambridge, MA) · Zbl 1141.68054
[5] Landwehr, N.; Passerini, A.; De Raedt, L.; Frasconi, P., Fast learning of relational kernels, Mach. Learn., 78, 3, 305-342 (2010) · Zbl 1470.68129
[6] Taskar, B.; Guestrin, C.; Koller, D., Max-margin Markov networks, (Thrun, S.; Saul, L.; Schölkopf, B., Proceedings of Neural Information Processing Systems (2003), MIT Press: MIT Press Cambridge, MA), 25-32
[7] Richardson, M.; Domingos, P., Markov logic networks, Mach. Learn., 62, 107-136 (2006) · Zbl 1470.68221
[8] Friedman, N.; Getoor, L.; Koller, D.; Pfeffer, A., Learning probabilistic relational models, (International Joint Conference on Artificial Intelligence (1999)), 1300-1309
[9] De, L., Raedt, Logical and relational learning, (Cognitive Technologies (2008), Springer: Springer New York) · Zbl 1203.68145
[10] Heckerman, D.; Meek, C.; Koller, D., Probabilistic entity-relationship models, PRMs, and plate models, (Getoor, L.; Taskar, B., Introduction to Statistical Relational Learning (2007), The MIT Press), 201-238
[11] Costa, F.; De Grave, K., Fast neighborhood subgraph pairwise distance kernel, (Proc. of the 27th International Conference on Machine Learning (ICML 2010) (2010)), 255-262
[12] Tsochantaridis, I.; Joachims, T.; Hofmann, T.; Altun, Y., Large margin methods for structured and interdependent output variables, J. Mach. Learn. Res., 6, 2, 1453-1484 (2006) · Zbl 1222.68321
[13] Ng, A. Y.; Jordan, M. I., On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes, (Dietterich, T.; Becker, S.; Ghahramani, Z., Advances in Neural Information Processing Systems (NIPS), vol. 14 (2002)), 841-848
[14] Sutton, C.; McCallum, A., An introduction to conditional random fields for relational learning, in: Getoor and Taskar [4], pp. 93-127
[15] Altun, Y.; Tsochantaridis, I.; Hofmann, T., Hidden Markov support vector machines, (Twentieth International Conference on Machine Learning (ICML 2003) (2003)), 3-10
[16] Lari, K.; Young, S., Applications of stochastic context-free grammars using the inside-outside algorithm, Comput. Speech Lang., 5, 3, 237-257 (1991)
[17] Muggleton, S., Stochastic logic programs, (De Raedt, L., Advances in Inductive Logic Programming (1996), IOS Press), 254-264
[18] Taskar, B.; Abbeel, P.; Koller, D., Discriminative probabilistic models for relational data, (Darwiche, A.; Friedman, N., Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (2002), Morgan Kaufmann: Morgan Kaufmann San Fransisco, CA), 895-902
[19] Serebrenik, A.; Schrijvers, T.; Demoen, B., Improving Prolog programs: refactoring for Prolog, Theory Pract. Log. Program., 8, 2, 201-215 (2008) · Zbl 1140.68359
[20] Argyriou, A.; Evgeniou, T.; Pontil, M., Convex multi-task feature learning, Mach. Learn., 73, 3, 243-272 (2008) · Zbl 1470.68073
[21] Costa, V. S.; Rocha, R.; Damas, L., The Yap Prolog system, Theory Pract. Log. Program., 12, 1-2, 5-34 (2012) · Zbl 1244.68017
[22] Chang, C.-C.; Lin, C.-J., LIBSVM: a library for support vector machines (2001), software available at
[23] Bottou, L., Large-scale machine learning with stochastic gradient descent, (Lechevallier, Y.; Saporta, G., Proceedings of the 19th International Conference on Computational Statistics (COMPSTAT’2010) (2010), Springer: Springer Paris, France), 177-187 · Zbl 1436.68293
[24] Srinivasan, A.; Muggleton, S. H.; King, R.; Sternberg, M., Mutagenesis: ILP experiments in a non-determinate biological domain, (Proceedings of the 4th International Workshop on Inductive Logic Programming. Proceedings of the 4th International Workshop on Inductive Logic Programming, GMD-Studien, vol. 237 (1994)), 217-232
[25] Wang, R.; Fu, Y.; Lai, L., A new atom-additive method for calculating partition coefficients, J. Chem. Inf. Comput. Sci., 37, 3, 615-621 (1997)
[26] Evgeniou, T.; Micchelli, C.; Pontil, M., Learning multiple tasks with kernel methods, J. Mach. Learn. Res., 6, 1, 615-637 (2006) · Zbl 1222.68197
[28] (Bakir, G. H.; Hofmann, T.; Schölkopf, B.; Smola, A. J.; Taskar, B.; Vishwanathan, S. V.N., Predicting Structured Data, Neural Information Processing (2007), The MIT Press)
[29] Joachims, T., Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms (2002), Kluwer Academic Publishers: Kluwer Academic Publishers Dordrecht
[30] Neville, J.; Jensen, D., Collective classification with relational dependency networks, (Workshop on Multi-Relational Data Mining (MRDM-2003) (2003))
[31] Frasconi, P.; Jaeger, M.; Passerini, A., Feature discovery with type extension trees, (Proc. of the 18th Int. Conf. on Inductive Logic Programming (2008), Springer), 122-139 · Zbl 1156.68531
[32] Vazquez, A.; Flammini, A.; Maritan, A.; Vespignani, A., Global protein function prediction from protein-protein interaction networks, Nat. Biotechnol., 21, 6, 697-700 (2003)
[33] Lanckriet, G.; Deng, M.; Cristianini, N.; Jordan, M.; Noble, W., Kernel-based data fusion and its application to protein function prediction in yeast, (Proceedings of the Pacific Symposium on Biocomputing, vol. 9 (2004)), 300-311
[34] Dietterich, T. G.; Lathrop, R. H.; Lozano-Pérez, T., Solving the multiple instance problem with axis-parallel rectangles, Artif. Intell., 89, 1-2, 31-71 (1997) · Zbl 1042.68650
[35] Koller, D.; Friedman, N., Probabilistic Graphical Models: Principles and Techniques, Adaptive Computation and Machine Learning (2009), MIT Press: MIT Press Cambridge, MA · Zbl 1183.68483
[36] Frasconi, P.; Gori, M.; Sperduti, A., A general framework for adaptive processing of data structures, IEEE Trans. Neural Netw., 9, 5, 768-786 (1998)
[37] Horváth, T.; Gärtner, T.; Wrobel, S., Cyclic pattern kernels for predictive graph mining, (Proceedings of KDD 04 (2004), ACM Press), 158-167
[38] Ralaivola, L.; Swamidass, S.; Saigo, H.; Baldi, P., Graph kernels for chemical informatics, Neural Netw., 18, 8, 1093-1110 (2005)
[39] Mahe, P.; Ueda, N.; Akutsu, T.; Perret, J.; Vert, J., Graph kernels for molecular structure-activity relationship analysis with support vector machines, J. Chem. Inf. Model., 45, 4, 939-951 (2005)
[40] Gärtner, T., Kernels for Structured Data, Series in Machine Perception and Artificial Intelligence, vol. 72 (2008), World Scientific: World Scientific Singapore · Zbl 1168.68039
[41] Vishwanathan, S. V.N.; Schraudolph, N. N.; Kondor, R.; Borgwardt, K. M., Graph kernels, J. Mach. Learn. Res., 99, 1201-1242 (2010) · Zbl 1242.05112
[42] Shervashidze, N.; Schweitzer, P.; Van Leeuwen, E. J.; Mehlhorn, K.; Borgwardt, K. M., Weisfeiler-Lehman graph kernels, J. Mach. Learn. Res., 12, 2539-2561 (2011) · Zbl 1280.68194
[43] McKay, B. D., Practical graph isomorphism, Congr. Numer., 30, 45-87 (1981) · Zbl 0521.05061
[44] Yan, X.; Han, J., gSpan: graph-based substructure pattern mining, (Proc. 2002 Int. Conf. Data Mining (ICDM’02) (2002)), 721-724
[45] Luks, E. M., Isomorphism of graphs of bounded valence can be tested in polynomial time, J. Comput. Syst. Sci., 25, 42-65 (1982) · Zbl 0493.68064
[46] Sorlin, S.; Solnon, C., A parametric filtering algorithm for the graph isomorphism problem, Constraints, 13, 518-537 (2008) · Zbl 1162.05337
[47] Menchetti, S.; Costa, F.; Frasconi, P., Weighted decomposition kernels, (De Raedt, L.; Wrobel, S., Proceedings of the 22nd International Conference on Machine Learning. Proceedings of the 22nd International Conference on Machine Learning, ACM International Conference Proceeding Series, vol. 119 (2005), ACM: ACM New York, NY), 585-592
[48] Kou, Z.; Cohen, W., Stacked graphical models for efficient inference in Markov random fields, (Proceedings of the Seventh SIAM International Conference on Data Mining (2007)), 533-538
[49] van de Waterbeemd, H.; Gifford, E., ADMET in silico modelling: towards prediction paradise?, Nat. Rev. Drug Discov., 2, 3, 192-204 (2003)
[50] Helma, C.; Kramer, S., A survey of the predictive toxicology challenge 2000-2001, Bioinformatics, 19, 10, 1179-1182 (2003)
[51] Helma, C., Predictive Toxicology (2005), CRC Press
[52] Ceroni, A.; Costa, F.; Frasconi, P., Classification of small molecules by two- and three-dimensional decomposition kernels, Bioinformatics, 23, 16, 2038-2045 (2007)
[53] Kazius, J.; McGuire, R.; Bursi, R., Derivation and validation of toxicophores for mutagenicity prediction, J. Med. Chem., 48, 1, 312-320 (2005)
[54] Blockeel, H.; Dzeroski, S.; Kompare, B.; Kramer, S.; Pfahringer, B.; Laer, W., Experiments in predicting biodegradability, Appl. Artif. Intell., 18, 2, 157-181 (2004)
[55] Ando, H. Y.; Dehaspe, L.; Luyten, W.; Craenenbroeck, E. V.; Vandecasteele, H.; Meervelt, L. V., Discovering H-bonding rules in crystals with inductive logic programming, Mol. Pharm., 3, 6, 665-674 (2006)
[56] De Grave, K.; Costa, F., Molecular graph augmentation with rings and functional groups, J. Chem. Inf. Model., 50, 9, 1660-1668 (2010)
[57] Wolpert, D., Stacked generalization, Neural Netw., 5, 2, 241-259 (1992)
[58] De Raedt, L., A perspective on inductive databases, ACM SIGKDD Explor. Newsl., 4, 2, 69-77 (2002)
[59] Boulicaut, J.-F.; Jeudy, B., Constraint-based data mining, (Maimon, O.; Rokach, L., Data Mining and Knowledge Discovery Handbook (2010), Springer), 339-354
[60] Mitchell, T., The discipline of machine learning (2006), Carnegie-Mellon University, Tech. Rep. CMU-ML-06-108
[61] De Raedt, L.; Nijssen, S., Towards programming languages for machine learning and data mining (extended abstract), (Kryszkiewicz, Marzena; Rybinski, Henryk; Skowron, Andrzej; Ras, Zbigniew W., Foundations of Intelligent Systems - 19th International Symposium ISMIS 2011, Warsaw, Poland, June 28-30, 2011. Proceedings. Foundations of Intelligent Systems - 19th International Symposium ISMIS 2011, Warsaw, Poland, June 28-30, 2011. Proceedings, Lecture Notes in Computer Science, vol. 6804 (2011), Springer), 25-32
[62] Rizzolo, N.; Roth, D., Learning based Java for rapid development of NLP systems, (Calzolari, Nicoletta; Choukri, Khalid; Maegaard, Bente; Mariani, Joseph; Odijk, Jan; Piperidis, Stelios; Rosner, Mike; Tapias, Daniel, Proceedings of the International Conference on Language Resources and Evaluation. Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, 17-23 May 2010, Valletta, Malta (2010))
[63] Lowd, D.; Domingos, P., Efficient weight learning for Markov logic networks, (Knowledge Discovery in Databases: PKDD 2007 (2007), Springer), 200-211
[64] Macskassy, S. A.; Provost, F., A simple relational classifier, (Proc. of the 2nd Workshop on Multi-Relational Data Mining (2003))
[65] Muggleton, S., Inverse entailment and Progol, New Gener. Comput., 13, 3-4, 245-286 (1995)
[66] Srinivasan, A., The Aleph manual (2007)
[67] Blockeel, H.; De Raedt, L., Top-down induction of first order logical decision trees, Artif. Intell., 101, 1-2, 285-297 (1998) · Zbl 0909.68034
[68] Bröcheler, M.; Mihalkova, L.; Getoor, L., Probabilistic similarity logic, (Proc. of the 26th Conference on Uncertainty in Artificial Intelligence. Proc. of the 26th Conference on Uncertainty in Artificial Intelligence, (UAI 2010) (2010))
[69] Kersting, K.; De Raedt, L., Bayesian logic programming: theory and tool, (Getoor, L.; Taskar, B., An Introduction to Statistical Relational Learning (2007), MIT Press), 291-321
[70] De Raedt, L.; Kimmig, A.; Toivonen, H., ProbLog: a probabilistic Prolog and its application in link discovery, (Proceedings of the 20th International Joint Conference on Artificial Intelligence (2007)), 2462-2467
[71] Kersting, K., Lifted probabilistic inference, (Proceedings of the 20th European Conference on Artificial Intelligence. Proceedings of the 20th European Conference on Artificial Intelligence, Front. Artif. Intell. Appl., vol. 242 (2012), IOS Press), 33-38
[72] Kersting, K.; Ahmadi, B.; Natarajan, S., Counting belief propagation, (Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI’09 (2009), AUAI Press: AUAI Press Arlington, Virginia, United States), 277-284
[73] Gärtner, T., A survey of kernels for structured data, ACM SIGKDD Explor. Newsl., 5, 1, 49-58 (2003)
[74] Wachman, G.; Khardon, R., Learning from interpretations: a rooted kernel for ordered hypergraphs, (Proceedings of the 24th International Conference on Machine Learning (2007)), 943-950
[75] Kramer, S.; Lavrač, N.; Flach, P., Propositionalization approaches to relational data mining, (Džeroski, S.; Lavrač, N., Relational Data Mining (2001), Springer), 262-291 · Zbl 1003.68039
[76] Quinlan, J. R., Learning logical definitions from relations, Mach. Learn., 5, 239-266 (1990)
[77] Rossi, R. A.; McDowell, L. K.; Aha, D. W.; Neville, J., Transforming graph data for statistical relational learning, J. Artif. Intell. Res., 45, 1, 363-441 (2012) · Zbl 1253.68285
[78] Lao, N.; Cohen, W. W., Relational retrieval using a combination of path-constrained random walks, Mach. Learn., 81, 1, 53-67 (2010) · Zbl 1470.68130
[79] Cook, D. J.; Holder, L. B., Mining Graph Data (2006), Wiley-Interscience
[80] Sun, Y.; Han, J., Mining heterogeneous information networks: principles and methodologies, Synth. Lect. Data Min. Knowl. Discov., 3, 2, 1-159 (2012)
[81] Chang, M.; Ratinov, L.; Rizzolo, N.; Roth, D., Learning and inference with constraints, (Proceedings of AAAI (2008)), 1513-1518
[82] McCallum, A.; Schultz, K.; Singh, S., FACTORIE: probabilistic programming via imperatively defined factor graphs, (Neural Information Processing Systems (NIPS) (2009)), 1249-1257
[83] Verbeke, M.; Van Asch, V.; Morante, R.; Frasconi, P.; Daelemans, W.; De Raedt, L., A statistical relational learning approach to identifying evidence based medicine categories, (Proc. of EMNLP-CoNLL (2012)), 579-589
[84] London, B.; Huang, B.; Taskar, B.; Getoor, L., Collective stability in structured prediction: generalization from one example, (Dasgupta, S.; Mcallester, D., Proceedings of the 30th International Conference on Machine Learning (ICML-13), JMLR Workshop and Conference Proceedings (2013)), 828-836
[85] Kimmig, A.; Demoen, B.; De Raedt, L.; Costa, V. S.; Rocha, R., On the implementation of the probabilistic logic programming language ProbLog, Theory Pract. Log. Program., 11, 2-3, 235-262 (2011) · Zbl 1220.68037
[86] Verbeke, M.; Frasconi, P.; Van Asch, V.; Morante, R.; Daelemans, W.; De Raedt, L., Kernel-based logical and relational learning with kLog for hedge cue detection, (Muggleton, S. H.; Tamaddoni-Nezhad, A.; Lisi, F. A., Inductive Logic Programming: Revised Selected Papers from the 21st International Conference (ILP 2011). Inductive Logic Programming: Revised Selected Papers from the 21st International Conference (ILP 2011), Lecture Notes in Artificial Intelligence (2012)), 347-357
[87] Kordjamshidi, P.; Frasconi, P.; Van Otterlo, M.; Moens, M.; De Raedt, L., Spatial relation extraction using relational learning, (Muggleton, S. H.; Tamaddoni-Nezhad, A.; Lisi, F. A., Inductive Logic Programming: Revised Selected Papers from the 21st International Conference (ILP 2011). Inductive Logic Programming: Revised Selected Papers from the 21st International Conference (ILP 2011), Lecture Notes in Artificial Intelligence (2012)), 204-220
[88] Antanas, L.; Frasconi, P.; Costa, F.; Tuytelaars, T.; De Raedt, L., A relational kernel-based framework for hierarchical image understanding, (Gimel’farb, G. L.; Hancock, E. R.; Imiya, A. I.; Kuijper, A.; Kudo, M.; Shinichiro Omachi, S.; Windeatt, T.; Yamada, K., Lecture Notes in Computer Science (2012), Springer), 171-180
[89] Antanas, L.; Hoffmann, M.; Frasconi, P.; Tuytelaars, T.; Raedt, L. D., A relational kernel-based approach to scene classification, (WACV (2013)), 133-139
[90] Antanas, L.; Frasconi, P.; Tuytelaars, T.; De Raedt, L., Employing logical languages for image understanding, (IEEE Workshop on Kernels and Distances for Computer Vision, International Conference on Computer Vision. IEEE Workshop on Kernels and Distances for Computer Vision, International Conference on Computer Vision, Barcelona, Spain, 13 November 2011 (Nov. 2011))
[91] Gross, J. L.; Yellen, J., Graph Theory and Its Applications, Discrete Mathematics and Its Applications (2005), Chapman & Hall/CRC
[92] Haussler, D., Convolution kernels on discrete structures (1999), UCSC-CRL, Tech. Rep. 99-10
[93] Shi, Q.; Petterson, J.; Dror, G.; Langford, J.; Smola, A.; Vishwanathan, S., Hash kernels for structured data, J. Mach. Learn. Res., 10, 2615-2637 (2009) · Zbl 1235.68188
[94] De Grave, K., Predictive quantitative structure-activity relationship models and their use for the efficient screening of molecules (August 2011), Katholieke Universiteit Leuven, Ph.D. thesis
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.