Computer Science > Computation and Language

arXiv:2105.05975 (cs)

[Submitted on 12 May 2021]

Title:Analysing The Impact Of Linguistic Features On Cross-Lingual Transfer

Authors:Błażej Dolicki, Gerasimos Spanakis

View PDF

Abstract:There is an increasing amount of evidence that in cases with little or no data in a target language, training on a different language can yield surprisingly good results. However, currently there are no established guidelines for choosing the training (source) language. In attempt to solve this issue we thoroughly analyze a state-of-the-art multilingual model and try to determine what impacts good transfer between languages. As opposed to the majority of multilingual NLP literature, we don't only train on English, but on a group of almost 30 languages. We show that looking at particular syntactic features is 2-4 times more helpful in predicting the performance than an aggregated syntactic similarity. We find out that the importance of syntactic features strongly differs depending on the downstream task - no single feature is a good performance predictor for all NLP tasks. As a result, one should not expect that for a target language $L_1$ there is a single language $L_2$ that is the best choice for any NLP task (for instance, for Bulgarian, the best source language is French on POS tagging, Russian on NER and Thai on NLI). We discuss the most important linguistic features affecting the transfer quality using statistical and machine learning methods.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2105.05975 [cs.CL]
	(or arXiv:2105.05975v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2105.05975

Submission history

From: Błażej Dolicki [view email]
[v1] Wed, 12 May 2021 21:22:58 UTC (180 KB)

Computer Science > Computation and Language

Title:Analysing The Impact Of Linguistic Features On Cross-Lingual Transfer

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Analysing The Impact Of Linguistic Features On Cross-Lingual Transfer

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators