swiss-ai/apertus-pretrain-romansh

Name: swiss-ai/apertus-pretrain-romansh
Creator: swiss-ai
Published: 2025-09-02 07:15:40
License: 暂无描述

Hugging Face2025-09-02 更新2025-09-13 收录

下载链接：

https://hf-mirror.com/datasets/swiss-ai/apertus-pretrain-romansh

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包括三个部分：罗马什语单语数据、多语种数据（精确地从罗马什语翻译成德语、法语、意大利语或英语），以及合成数据。多语种数据分为对齐和非对齐数据。合成数据是通过交织翻译数据并加上“这是一段从源语言翻译成罗马什·格里舒恩语的文本”这样的前缀创建的。数据中包含法律文本、公告、双语文本、在线词典和罗马什语维基网站内容。数据经过特定的管道预处理。

This dataset consists of three parts: monolingual Romansh data, polylingual data with precise translations from Romansh into German, French, Italian, or English, and synthetic data created by interweaving translational data with a prefixed sentence stating, This is a text translated from SOURCE LANGUAGE to Rumantsch Grischun. The data includes legal texts, announcements, bilingual corpora, online dictionaries, and Romansh Wikipedia websites. The data has been preprocessed using a specific pipeline.

提供机构：

swiss-ai

5,000+

优质数据集

54 个

任务类型

进入经典数据集