gplsi/alia_multilingual_parallel_sentences

Name: gplsi/alia_multilingual_parallel_sentences
Creator: gplsi
Published: 2026-02-09 11:51:25
License: 暂无描述

Hugging Face2026-02-09 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/gplsi/alia_multilingual_parallel_sentences

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是从平行语料库构建的，用于翻译任务，旨在用于语言模型的持续预训练。它提供了多种语言的对齐句子，以促进多语言学习。数据集以JSON Lines格式存储，每条记录包含多种语言的句子，每个句子都带有语言全名前缀。支持的语言包括瓦伦西亚语(Valencià)、西班牙语(Español)和英语(English)。

The dataset is built from parallel corpora for translation tasks and is intended to be used for continual pretraining of language models. It provides aligned sentences in multiple languages to facilitate multilingual learning. The dataset is stored in a JSON Lines file where each line contains sentences in multiple languages, each prefixed with the full name of the language. Supported languages include Valencià, Español, and English.

提供机构：

gplsi

5,000+

优质数据集

54 个

任务类型

进入经典数据集