Document Zbl 1136.68509

An inductive learning system for XML documents. (English) Zbl 1136.68509

Blockeel, Hendrik (ed.) et al., Inductive logic programming. 17th international conference, ILP 2007, Corvallis, OR, USA, June 19–21, 2007. Revised selected papers. Berlin: Springer (ISBN 978-3-540-78468-5/pbk). Lecture Notes in Computer Science 4894. Lecture Notes in Artificial Intelligence, 292-306 (2008).

Summary: This paper presents a complete inductive learning system that aims to produce comprehensible theories for XML document classifications. The knowledge representation method is based on a higher-order logic formalism which is particularly suitable for structured-data learning systems. A systematic way of generating predicates is also given. The learning algorithm of the system is a modified standard decision-tree learning algorithm driven by predicate/recall breakeven point. Experimental results on XML version of Reuters dataset show that this system is able to produce comprehensible theories with high precision/recall breakeven point values.
For the entire collection see [Zbl 1132.68005].

MSC:

68T05	Learning and adaptive systems in artificial intelligence
68T27	Logic in artificial intelligence
68T30	Knowledge representation

Keywords:

higher-order logic; knowledge representation; XML documents; precision-recall; decision-tree learning

Cite Review PDF

Full Text: DOI

References:

[1]	Dagan, I.; Karov, Y.; Roth, D., Mistake-driven learning in text categorization, Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (1997), Menlo Park: AAAI Press, Menlo Park
[2]	Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, pp. 148-155 (1998)
[3]	Dumais, S.T., Chen, H.: Hierarchical classification of web content. In: Proceedings of ACM-SIGIR International Conference on Research and Development in Information Retrieval, Athens, pp. 256-263 (2000)
[4]	Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Proceedings of ICML 1997, 14th International Conference on Machine Learning (1997)
[5]	Lewis, D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Proceedings of SDAIR 1994, 3rd Annual Symposium on Document Analysis and Information Retrieval (1994)
[6]	Lloyd, J. W., Logic for Learning: Learning Comprehensible Theories from Structured Data (2003), Heidelberg: Springer, Heidelberg · Zbl 1055.68086
[7]	Sebastiani, F.: A tutorial on automated text categorisation. In: Proceedings of ASAI 1999, First Argentinian Symposium on Artificial Intelligence, Buenos Aires, AR, pp. 7-35 (1999)
[8]	van Rijsbergen, C. J., Information Retrieval (1979), London: Butterworths, London · Zbl 0227.68052
[9]	Wu, X.: Knowledge Representation and Learning For Semistructured Data. PhD thesis, The Australian National University (2006)
[10]	Yang, Y., An evaluation of statistical approaches to text categorization, ACM Transactions on Information Systems, 12, 3, 296-333 (1998)
[11]	Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of ICML 1997, 14th International Conference on Machine Learning, Nashville, TX, Fisher, D.H. (eds.). pp. 412-420 (1997)

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.