five

RicardoRei/wmt-mqm-error-spans

收藏
Hugging Face2023-11-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/RicardoRei/wmt-mqm-error-spans
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 language: - en - de - ru - zh tags: - mt-evaluation - WMT - MQM size_categories: - 100K<n<1M --- # Dataset Summary This dataset contains all MQM human annotations from previous [WMT Metrics shared tasks](https://wmt-metrics-task.github.io/) and the MQM annotations from [Experts, Errors, and Context](https://aclanthology.org/2021.tacl-1.87/) in a form of error spans. Moreover, it contains some hallucinations used in the training of [XCOMET models](https://huggingface.co/Unbabel/XCOMET-XXL). **Please note that this is not an official release of the data** and the original data can be found [here](https://github.com/google/wmt-mqm-human-evaluation). The data is organised into 8 columns: - src: input text - mt: translation - ref: reference translation - annotations: List of error spans (dictionaries with 'start', 'end', 'severity', 'text') - lp: language pair While `en-ru` was annotated by Unbabel, `en-de` and `zh-en` was annotated by Google. This means that for en-de and zh-en you will only find minor and major errors while for en-ru you can find a few critical errors. ## Python usage: ```python from datasets import load_dataset dataset = load_dataset("RicardoRei/wmt-mqm-error-spans", split="train") ``` There is no standard train/test split for this dataset but you can easily split it according to year, language pair or domain. E.g. : ```python # split by LP data = dataset.filter(lambda example: example["lp"] == "en-de") ``` ## Citation Information If you use this data please cite the following works: - [Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation](https://aclanthology.org/2021.tacl-1.87/) - [Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain](https://aclanthology.org/2021.wmt-1.73/) - [Results of WMT22 Metrics Shared Task: Stop Using BLEU – Neural Metrics Are Better and More Robust](https://aclanthology.org/2022.wmt-1.2/) - [xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection](https://arxiv.org/pdf/2310.10482.pdf)
提供机构:
RicardoRei
原始信息汇总

数据集概述

该数据集包含以往WMT Metrics shared tasksExperts, Errors, and Context中的所有MQM人工标注,以及用于XCOMET模型训练的一些幻觉数据。数据以错误跨度的形式呈现。

请注意,这不是数据的官方发布,原始数据可在这里找到。

数据包含以下8列:

  • src: 输入文本
  • mt: 翻译文本
  • ref: 参考翻译
  • annotations: 错误跨度列表(包含start, end, severity, text的字典)
  • lp: 语言对

en-ru由Unbabel标注,而en-dezh-en由Google标注。这意味着对于en-de和zh-en,你只会找到轻微和重大错误,而对于en-ru,你可能会找到一些严重错误。

Python使用示例

python from datasets import load_dataset dataset = load_dataset("RicardoRei/wmt-mqm-error-spans", split="train")

该数据集没有标准的训练/测试分割,但你可以根据年份、语言对或领域轻松分割。例如:

python

按语言对分割

data = dataset.filter(lambda example: example["lp"] == "en-de")

引用信息

如果使用该数据,请引用以下作品:

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作