RicardoRei/wmt-mqm-error-spans
收藏Hugging Face2023-11-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/RicardoRei/wmt-mqm-error-spans
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
- de
- ru
- zh
tags:
- mt-evaluation
- WMT
- MQM
size_categories:
- 100K<n<1M
---
# Dataset Summary
This dataset contains all MQM human annotations from previous [WMT Metrics shared tasks](https://wmt-metrics-task.github.io/) and the MQM annotations from [Experts, Errors, and Context](https://aclanthology.org/2021.tacl-1.87/) in a form of error spans. Moreover, it contains some hallucinations used in the training of [XCOMET models](https://huggingface.co/Unbabel/XCOMET-XXL).
**Please note that this is not an official release of the data** and the original data can be found [here](https://github.com/google/wmt-mqm-human-evaluation).
The data is organised into 8 columns:
- src: input text
- mt: translation
- ref: reference translation
- annotations: List of error spans (dictionaries with 'start', 'end', 'severity', 'text')
- lp: language pair
While `en-ru` was annotated by Unbabel, `en-de` and `zh-en` was annotated by Google. This means that for en-de and zh-en you will only find minor and major errors while for en-ru you can find a few critical errors.
## Python usage:
```python
from datasets import load_dataset
dataset = load_dataset("RicardoRei/wmt-mqm-error-spans", split="train")
```
There is no standard train/test split for this dataset but you can easily split it according to year, language pair or domain. E.g. :
```python
# split by LP
data = dataset.filter(lambda example: example["lp"] == "en-de")
```
## Citation Information
If you use this data please cite the following works:
- [Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation](https://aclanthology.org/2021.tacl-1.87/)
- [Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain](https://aclanthology.org/2021.wmt-1.73/)
- [Results of WMT22 Metrics Shared Task: Stop Using BLEU – Neural Metrics Are Better and More Robust](https://aclanthology.org/2022.wmt-1.2/)
- [xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection](https://arxiv.org/pdf/2310.10482.pdf)
提供机构:
RicardoRei
原始信息汇总
数据集概述
该数据集包含以往WMT Metrics shared tasks和Experts, Errors, and Context中的所有MQM人工标注,以及用于XCOMET模型训练的一些幻觉数据。数据以错误跨度的形式呈现。
请注意,这不是数据的官方发布,原始数据可在这里找到。
数据包含以下8列:
src: 输入文本mt: 翻译文本ref: 参考翻译annotations: 错误跨度列表(包含start, end, severity, text的字典)lp: 语言对
en-ru由Unbabel标注,而en-de和zh-en由Google标注。这意味着对于en-de和zh-en,你只会找到轻微和重大错误,而对于en-ru,你可能会找到一些严重错误。
Python使用示例
python from datasets import load_dataset dataset = load_dataset("RicardoRei/wmt-mqm-error-spans", split="train")
该数据集没有标准的训练/测试分割,但你可以根据年份、语言对或领域轻松分割。例如:
python
按语言对分割
data = dataset.filter(lambda example: example["lp"] == "en-de")
引用信息
如果使用该数据,请引用以下作品:
- Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation
- Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain
- Results of WMT22 Metrics Shared Task: Stop Using BLEU – Neural Metrics Are Better and More Robust
- xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection



