MLQE

Name: MLQE
Creator: Wikipedia
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/facebookresearch/mlqe

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集源自维基百科文章，涵盖了高资源、中资源和低资源语言的对齐语料，且每对翻译均有人工标注的标签。具体包含的语言对来自高资源（如英语-德语、英语-中文）、中资源（如罗马尼亚语-英语、爱沙尼亚语-英语）以及低资源（如尼泊尔语-英语、僧伽罗语-英语）语言。该数据集的规模包括7000个训练对、1000个验证对以及1000个测试对。其任务是进行质量估计。

This dataset is derived from Wikipedia articles, covering aligned corpora across high-resource, mid-resource and low-resource languages, with every translation pair manually labeled. The included language pairs span three categories: high-resource pairs such as English-German and English-Chinese, mid-resource pairs like Romanian-English and Estonian-English, as well as low-resource pairs including Nepali-English and Sinhala-English. The dataset consists of 7,000 training pairs, 1,000 validation pairs and 1,000 test pairs. Its core task is quality estimation.

提供机构：

Wikipedia

5,000+

优质数据集

54 个

任务类型

进入经典数据集