Multi-lingual dependency parsing evaluation: a large-scale analysis of word order properties using artificial data

K Gulordava, P Merlo�- Transactions of the Association for�…, 2016 - direct.mit.edu
Transactions of the Association for Computational Linguistics, 2016direct.mit.edu
The growing work in multi-lingual parsing faces the challenge of fair comparative evaluation
and performance analysis across languages and their treebanks. The difficulty lies in
teasing apart the properties of treebanks, such as their size or average sentence length,
from those of the annotation scheme, and from the linguistic properties of languages. We
propose a method to evaluate the effects of word order of a language on dependency
parsing performance, while controlling for confounding treebank properties. The method�…
Abstract
The growing work in multi-lingual parsing faces the challenge of fair comparative evaluation and performance analysis across languages and their treebanks. The difficulty lies in teasing apart the properties of treebanks, such as their size or average sentence length, from those of the annotation scheme, and from the linguistic properties of languages. We propose a method to evaluate the effects of word order of a language on dependency parsing performance, while controlling for confounding treebank properties. The method uses artificially-generated treebanks that are minimal permutations of actual treebanks with respect to two word order properties: word order variation and dependency lengths. Based on these artificial data on twelve languages, we show that longer dependencies and higher word order variability degrade parsing performance. Our method also extends to minimal pairs of individual sentences, leading to a finer-grained understanding of parsing errors.
MIT Press
Showing the best result for this search. See all results