MLQE
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/facebookresearch/mlqe
下载链接
链接失效反馈官方服务:
资源简介:
该数据集源自维基百科文章,涵盖了高资源、中资源和低资源语言的对齐语料,且每对翻译均有人工标注的标签。具体包含的语言对来自高资源(如英语-德语、英语-中文)、中资源(如罗马尼亚语-英语、爱沙尼亚语-英语)以及低资源(如尼泊尔语-英语、僧伽罗语-英语)语言。该数据集的规模包括7000个训练对、1000个验证对以及1000个测试对。其任务是进行质量估计。
This dataset is derived from Wikipedia articles, covering aligned corpora across high-resource, mid-resource and low-resource languages, with every translation pair manually labeled. The included language pairs span three categories: high-resource pairs such as English-German and English-Chinese, mid-resource pairs like Romanian-English and Estonian-English, as well as low-resource pairs including Nepali-English and Sinhala-English. The dataset consists of 7,000 training pairs, 1,000 validation pairs and 1,000 test pairs. Its core task is quality estimation.
提供机构:
Wikipedia



