jvamvas/apertus-pretrain-romansh-backtranslated
收藏Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/jvamvas/apertus-pretrain-romansh-backtranslated
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是swiss-ai/apertus-pretrain-romansh数据集的版本,仅包含monolingual分割,并增加了机器翻译生成的德语翻译。其目的是训练机器翻译系统或大型语言模型进行德语→罗曼什语翻译,但需要注意德语翻译可能存在错误,因为它们是自动生成的。数据集包含源数据集的原始字段,并新增了四个字段,涉及语言变体预测、回译和语言识别分数。README还详细说明了用于翻译的MT系统、语言分类方法和数据过滤过程。
Version of https://hf.co/datasets/swiss-ai/apertus-pretrain-romansh (`monolingual` split only) that includes MT-generated translations into German. The intended purpose of this dataset is to train MT systems or LLMs on the task of idiom-specific German→Romansh translation. Note that the German translations in this dataset might contain errors, since they have been automatically generated by an MT system. The dataset includes original fields from the source dataset and adds four new fields related to language variety prediction, backtranslation, and language identification scores. The README also details the MT system used for translations, language classification methods, and data filtering processes.
提供机构:
jvamvas



