stas/wmt14-en-de-pre-processed
收藏Hugging Face2021-02-16 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/stas/wmt14-en-de-pre-processed
下载链接
链接失效反馈官方服务:
资源简介:
# WMT14 English-German Translation Data w/ further preprocessing
The original pre-processing script is [here](https://github.com/pytorch/fairseq/blob/master/examples/translation/prepare-wmt14en2de.sh).
This pre-processed dataset was created by running:
```
git clone https://github.com/pytorch/fairseq
cd fairseq
cd examples/translation/
./prepare-wmt14en2de.sh
```
It was originally used by `transformers` [`finetune_trainer.py`](https://github.com/huggingface/transformers/blob/641f418e102218c4bf16fcd3124bfebed6217ef6/examples/seq2seq/finetune_trainer.py)
The data itself resides at https://cdn-datasets.huggingface.co/translation/wmt_en_de.tgz
提供机构:
stas
原始信息汇总
WMT14 English-German Translation Data 预处理详情
数据集来源
- 原始数据集通过执行脚本
prepare-wmt14en2de.sh进行预处理。
预处理步骤
- 克隆仓库:
git clone https://github.com/pytorch/fairseq - 进入目录:
cd fairseq/examples/translation/ - 运行脚本:
./prepare-wmt14en2de.sh
数据集用途
- 该数据集最初用于
transformers库中的finetune_trainer.py脚本。
数据集存储位置
- 数据集文件存放在:https://cdn-datasets.huggingface.co/translation/wmt_en_de.tgz



