gplsi/alia_multilingual_parallel_sentences
收藏Hugging Face2026-02-09 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/gplsi/alia_multilingual_parallel_sentences
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是从平行语料库构建的,用于翻译任务,旨在用于语言模型的持续预训练。它提供了多种语言的对齐句子,以促进多语言学习。数据集以JSON Lines格式存储,每条记录包含多种语言的句子,每个句子都带有语言全名前缀。支持的语言包括瓦伦西亚语(Valencià)、西班牙语(Español)和英语(English)。
The dataset is built from parallel corpora for translation tasks and is intended to be used for continual pretraining of language models. It provides aligned sentences in multiple languages to facilitate multilingual learning. The dataset is stored in a JSON Lines file where each line contains sentences in multiple languages, each prefixed with the full name of the language. Supported languages include Valencià, Español, and English.
提供机构:
gplsi



