Discriminative reranking for natural language parsing. (English) Zbl 1234.68404
Summary: This article considers approaches which rerank the output of an existing probabilistic parser. The base parser produces a set of candidate parses for each input sentence, with associated probabilities that define an initial ranking of these parses. A second model then attempts to improve upon this initial ranking, using additional features of the tree as evidence. The strength of our approach is that it allows a tree to be represented as an arbitrary set of features, without concerns about how these features interact or overlap and without the need to define a derivation or a generative model which takes these features into account. We introduce a new method for the reranking task, based on the boosting approach to ranking problems described in [Y. Freund, R. E. Raj Iyer, R. E. Schapire and Y. Singer, “An efficient boosting algorithm for combining preferences”, in: Machine learning: Proceedings of the 15th international
conference (1998)]. We apply the boosting method to parsing the Wall Street Journal treebank. The method combined the log-likelihood under a baseline model with evidence from an additional 500,000 features over parse trees that were not included in the original model. The new model achieved 89.75% F-measure, a 13% relative decrease in F-measure error over the baseline model’s score of 88.2%. The article also introduces a new algorithm for the boosting approach which takes advantage of the sparsity of the feature space in the parsing data. Experiments show significant efficiency gains for the new algorithm over the obvious implementation of the boosting approach. We argue that the method is an appealing alternative – in terms of both simplicity and efficiency – to work on feature selection methods within log-linear (maximum-entropy) models. Although the experiments in this article are on natural language parsing (NLP), the approach should be applicable to many other NLP problems which are naturally framed as ranking tasks, for example, speech recognition, machine translation, or natural language generation.
MSC:
68T50 | Natural language processing |
68T05 | Learning and adaptive systems in artificial intelligence |
Software:
Penn TreebankReferences:
[1] | Abney Steven, Computational Linguistics 23 (4) pp 597– (1997) · Zbl 1128.68106 |
[2] | Berger Adam L, Computational Linguistics 22 (1) pp 39– (1996) |
[3] | DOI: 10.1109/34.588021 · doi:10.1109/34.588021 |
[4] | DOI: 10.1006/jcss.1997.1504 · Zbl 0880.68103 · doi:10.1006/jcss.1997.1504 |
[5] | DOI: 10.1023/A:1007662407062 · Zbl 0944.68535 · doi:10.1023/A:1007662407062 |
[6] | DOI: 10.1214/aos/1016218223 · Zbl 1106.62323 · doi:10.1214/aos/1016218223 |
[7] | DOI: 10.1006/jcss.1995.1011 · Zbl 0939.68771 · doi:10.1006/jcss.1995.1011 |
[8] | DOI: 10.1016/S0022-0000(75)80019-5 · Zbl 0326.68053 · doi:10.1016/S0022-0000(75)80019-5 |
[9] | Marcus Mitchell, Computational Linguistics 19 (2) pp 313– (1993) |
[10] | DOI: 10.1214/aos/1024691352 · Zbl 0929.62069 · doi:10.1214/aos/1024691352 |
[11] | DOI: 10.1023/A:1007614523901 · Zbl 0945.68194 · doi:10.1023/A:1007614523901 |
[12] | DOI: 10.1023/A:1007649029923 · Zbl 0951.68561 · doi:10.1023/A:1007649029923 |
[13] | DOI: 10.1145/1968.1972 · Zbl 0587.68077 · doi:10.1145/1968.1972 |
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.