MLQE-PE
收藏arXiv2021-10-11 更新2024-06-21 收录
下载链接:
https://github.com/sheffieldnlp/mlqe-pe
下载链接
链接失效反馈官方服务:
资源简介:
MLQE-PE是由谢菲尔德大学等多个机构合作创建的多语言质量评估和后期编辑数据集,包含11种语言对,每种语言对最多有10,000条翻译数据。数据集内容丰富,包括句子级直接评估、后期编辑努力和词级好坏标签,以及后期编辑句子和文章标题。创建过程涉及从维基百科等来源收集数据,使用最先进的神经机器翻译模型进行翻译。该数据集主要应用于机器翻译质量评估和自动后期编辑,旨在解决机器翻译中的质量问题,特别是在资源较少语言对中的挑战。
MLQE-PE is a multilingual quality estimation and post-editing dataset co-created by the University of Sheffield and multiple other institutions. It covers 11 language pairs, with each language pair containing up to 10,000 translation instances. The dataset features rich content, including sentence-level direct assessment, post-editing effort, word-level quality tags, as well as post-edited sentences and article titles. Its development involved collecting data from sources such as Wikipedia and translating the collected data using state-of-the-art neural machine translation models. This dataset is primarily applied to machine translation quality estimation and automatic post-editing, aiming to address quality issues in machine translation, especially the challenges faced by low-resource language pairs.
提供机构:
谢菲尔德大学
创建时间:
2020-10-09



