Computer Science > Computation and Language

arXiv:2108.09484 (cs)

[Submitted on 21 Aug 2021 (v1), last revised 15 Nov 2021 (this version, v3)]

Title:cushLEPOR: customising hLEPOR metric using Optuna for higher agreement with human judgments or pre-trained language model LaBSE

Authors:Lifeng Han, Irina Sorokina, Gleb Erofeev, Serge Gladkoff

View PDF

Abstract:Human evaluation has always been expensive while researchers struggle to trust the automatic metrics. To address this, we propose to customise traditional metrics by taking advantages of the pre-trained language models (PLMs) and the limited available human labelled scores. We first re-introduce the hLEPOR metric factors, followed by the Python version we developed (ported) which achieved the automatic tuning of the weighting parameters in hLEPOR metric. Then we present the customised hLEPOR (cushLEPOR) which uses Optuna hyper-parameter optimisation framework to fine-tune hLEPOR weighting parameters towards better agreement to pre-trained language models (using LaBSE) regarding the exact MT language pairs that cushLEPOR is deployed to. We also optimise cushLEPOR towards professional human evaluation data based on MQM and pSQM framework on English-German and Chinese-English language pairs. The experimental investigations show cushLEPOR boosts hLEPOR performances towards better agreements to PLMs like LaBSE with much lower cost, and better agreements to human evaluations including MQM and pSQM scores, and yields much better performances than BLEU (data available at \url{this https URL}). Official results show that our submissions win three language pairs including \textbf{English-German} and \textbf{Chinese-English} on \textit{News} domain via cushLEPOR(LM) and \textbf{English-Russian} on \textit{TED} domain via hLEPOR.

Comments:	Forthcoming: in Proceedings of Six Conference on Machine Translation (WMT2021)
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2108.09484 [cs.CL]
	(or arXiv:2108.09484v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2108.09484

Submission history

From: Lifeng Han [view email]
[v1] Sat, 21 Aug 2021 10:21:21 UTC (1,140 KB)
[v2] Mon, 30 Aug 2021 22:37:54 UTC (1,150 KB)
[v3] Mon, 15 Nov 2021 12:17:15 UTC (522 KB)

Computer Science > Computation and Language

Title:cushLEPOR: customising hLEPOR metric using Optuna for higher agreement with human judgments or pre-trained language model LaBSE

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:cushLEPOR: customising hLEPOR metric using Optuna for higher agreement with human judgments or pre-trained language model LaBSE

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators