swiss-ai/apertus-pretrain-romansh
收藏Hugging Face2025-09-02 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/swiss-ai/apertus-pretrain-romansh
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包括三个部分:罗马什语单语数据、多语种数据(精确地从罗马什语翻译成德语、法语、意大利语或英语),以及合成数据。多语种数据分为对齐和非对齐数据。合成数据是通过交织翻译数据并加上“这是一段从源语言翻译成罗马什·格里舒恩语的文本”这样的前缀创建的。数据中包含法律文本、公告、双语文本、在线词典和罗马什语维基网站内容。数据经过特定的管道预处理。
This dataset consists of three parts: monolingual Romansh data, polylingual data with precise translations from Romansh into German, French, Italian, or English, and synthetic data created by interweaving translational data with a prefixed sentence stating, This is a text translated from SOURCE LANGUAGE to Rumantsch Grischun. The data includes legal texts, announcements, bilingual corpora, online dictionaries, and Romansh Wikipedia websites. The data has been preprocessed using a specific pipeline.
提供机构:
swiss-ai



