RicardoRei/wmt-mqm-error-spans

Name: RicardoRei/wmt-mqm-error-spans
Creator: RicardoRei
Published: 2023-11-30 19:14:18
License: 暂无描述

Hugging Face2023-11-30 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/RicardoRei/wmt-mqm-error-spans

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 language: - en - de - ru - zh tags: - mt-evaluation - WMT - MQM size_categories: - 100K<n<1M --- # Dataset Summary This dataset contains all MQM human annotations from previous [WMT Metrics shared tasks](https://wmt-metrics-task.github.io/) and the MQM annotations from [Experts, Errors, and Context](https://aclanthology.org/2021.tacl-1.87/) in a form of error spans. Moreover, it contains some hallucinations used in the training of [XCOMET models](https://huggingface.co/Unbabel/XCOMET-XXL). **Please note that this is not an official release of the data** and the original data can be found [here](https://github.com/google/wmt-mqm-human-evaluation). The data is organised into 8 columns: - src: input text - mt: translation - ref: reference translation - annotations: List of error spans (dictionaries with 'start', 'end', 'severity', 'text') - lp: language pair While `en-ru` was annotated by Unbabel, `en-de` and `zh-en` was annotated by Google. This means that for en-de and zh-en you will only find minor and major errors while for en-ru you can find a few critical errors. ## Python usage: ```python from datasets import load_dataset dataset = load_dataset("RicardoRei/wmt-mqm-error-spans", split="train") ``` There is no standard train/test split for this dataset but you can easily split it according to year, language pair or domain. E.g. : ```python # split by LP data = dataset.filter(lambda example: example["lp"] == "en-de") ``` ## Citation Information If you use this data please cite the following works: - [Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation](https://aclanthology.org/2021.tacl-1.87/) - [Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain](https://aclanthology.org/2021.wmt-1.73/) - [Results of WMT22 Metrics Shared Task: Stop Using BLEU – Neural Metrics Are Better and More Robust](https://aclanthology.org/2022.wmt-1.2/) - [xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection](https://arxiv.org/pdf/2310.10482.pdf)

提供机构：

RicardoRei

原始信息汇总

数据集概述

该数据集包含以往WMT Metrics shared tasks和Experts, Errors, and Context中的所有MQM人工标注，以及用于XCOMET模型训练的一些幻觉数据。数据以错误跨度的形式呈现。

请注意，这不是数据的官方发布，原始数据可在这里找到。

数据包含以下8列：

src: 输入文本
mt: 翻译文本
ref: 参考翻译
annotations: 错误跨度列表（包含start, end, severity, text的字典）
lp: 语言对

en-ru由Unbabel标注，而en-de和zh-en由Google标注。这意味着对于en-de和zh-en，你只会找到轻微和重大错误，而对于en-ru，你可能会找到一些严重错误。

Python使用示例

python from datasets import load_dataset dataset = load_dataset("RicardoRei/wmt-mqm-error-spans", split="train")

该数据集没有标准的训练/测试分割，但你可以根据年份、语言对或领域轻松分割。例如：

python

按语言对分割

data = dataset.filter(lambda example: example["lp"] == "en-de")

引用信息

如果使用该数据，请引用以下作品：

5,000+

优质数据集

54 个

任务类型

进入经典数据集