WMT17 APE dataset
收藏arXiv2025-09-30 收录
下载链接:
https://www.statmt.org/wmt17/ape-task.html
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是WMT17自动后编辑数据集,包含了真实的源句子、机器翻译及其相应的后编辑句子三联组,专门为自动后编辑实验设计。此外,该数据集已被过度采样,以创建大约500万个三联组,用于预训练和自动后编辑训练。具体规模上,有23,000个真实三联组用于训练,2,000个用于测试,另外还包括50万个高质量和400万个低质量的合成三联组。所涉及的任务是自动后编辑(APE)。
This dataset is the WMT17 Automatic Post-Editing (APE) dataset, which contains real source sentences, machine translations and their corresponding post-edited triplets, specifically designed for automatic post-editing experiments. Furthermore, this dataset has been oversampled to generate approximately 5 million triplets for pre-training and automatic post-editing training. In terms of scale, there are 23,000 real triplets for training, 2,000 for testing, plus 500,000 high-quality and 4 million low-quality synthetic triplets. The core task addressed by this dataset is Automatic Post-Editing (APE).
提供机构:
WMT (Workshop on Machine Translation)



