German-Upper Sorbian (de-hsb) parallel corpus

arXiv2025-09-30 收录

下载链接：

https://www.statmt.org/wmt20/unsup_and_very_low_res/

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了用于低资源语言翻译任务训练、验证和测试的平行句子。该数据集旨在通过高质量的合成数据增强神经机器翻译（NMT）模型的能力。具体规模方面，训练集包含60,000个句子，而验证集和测试集则各有2,000个句子。

This dataset contains parallel sentence pairs for training, validation, and testing of low-resource language translation tasks. It aims to enhance the capabilities of Neural Machine Translation (NMT) models using high-quality synthetic data. In terms of scale, the training set includes 60,000 sentence pairs, while the validation and test sets each have 2,000 sentence pairs.

提供机构：

WMT

5,000+

优质数据集

54 个

任务类型

进入经典数据集