wmt/wmt_t2t
收藏数据集概述
基本信息
- 名称: WMT T2T
- 语言: 德语 (de), 英语 (en)
- 许可证: 未知
- 多语言性: 翻译
- 大小: 10M<n<100M
数据集结构
- 特征:
translation: 包含德语和英语的多语言字符串
- 分割:
train: 4592289 个例子, 1385106499 字节validation: 3000 个例子, 736407 字节test: 3003 个例子, 777326 字节
- 下载大小: 835031826 字节
- 数据集大小: 1386620232 字节
数据源
- 源数据集:
europarl_bilingualnews_commentaryopus_paracrawlun_multi
配置
- 配置名称: de-en
- 默认配置: 是
- 数据文件路径:
train: de-en/train-*validation: de-en/validation-*test: de-en/test-*
引用信息
@InProceedings{bojar-EtAl:2014:W14-33, author = {Bojar, Ondrej and Buck, Christian and Federmann, Christian and Haddow, Barry and Koehn, Philipp and Leveling, Johannes and Monz, Christof and Pecina, Pavel and Post, Matt and Saint-Amand, Herve and Soricut, Radu and Specia, Lucia and Tamchyna, Ale {s}}, title = {Findings of the 2014 Workshop on Statistical Machine Translation}, booktitle = {Proceedings of the Ninth Workshop on Statistical Machine Translation}, month = {June}, year = {2014}, address = {Baltimore, Maryland, USA}, publisher = {Association for Computational Linguistics}, pages = {12--58}, url = {http://www.aclweb.org/anthology/W/W14/W14-3302} }



