WMT17 APE dataset

Name: WMT17 APE dataset
Creator: WMT (Workshop on Machine Translation)
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://www.statmt.org/wmt17/ape-task.html

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是WMT17自动后编辑数据集，包含了真实的源句子、机器翻译及其相应的后编辑句子三联组，专门为自动后编辑实验设计。此外，该数据集已被过度采样，以创建大约500万个三联组，用于预训练和自动后编辑训练。具体规模上，有23,000个真实三联组用于训练，2,000个用于测试，另外还包括50万个高质量和400万个低质量的合成三联组。所涉及的任务是自动后编辑（APE）。

This dataset is the WMT17 Automatic Post-Editing (APE) dataset, which contains real source sentences, machine translations and their corresponding post-edited triplets, specifically designed for automatic post-editing experiments. Furthermore, this dataset has been oversampled to generate approximately 5 million triplets for pre-training and automatic post-editing training. In terms of scale, there are 23,000 real triplets for training, 2,000 for testing, plus 500,000 high-quality and 4 million low-quality synthetic triplets. The core task addressed by this dataset is Automatic Post-Editing (APE).

提供机构：

WMT (Workshop on Machine Translation)

5,000+

优质数据集

54 个

任务类型

进入经典数据集