BeitTigreAI/tigre-data-parallel-multilingual
收藏Hugging Face2025-11-19 更新2025-11-30 收录
下载链接:
https://hf-mirror.com/datasets/BeitTigreAI/tigre-data-parallel-multilingual
下载链接
链接失效反馈官方服务:
资源简介:
Tigre平行多语言数据集(Tigre-Data 1.0)是一个旨在促进低资源自然语言处理和形态丰富语言模型研究的平行语料库。该数据集提供了清洁、高质量的对齐语料,对于开发和评估针对Tigre语言的机器翻译系统至关重要。数据集包含来自Tatoeba.org的平行句子,由Tigre流散社区成员贡献,并遵循CC-BY 2.0许可。数据集以Parquet格式提供,包含329,554个平行句子,分布在七种目标语言中。
The Tigre Parallel Multilingual Dataset (Tigre-Data 1.0) is a parallel corpus aimed at accelerating research in low-resource NLP and morphologically rich language modeling. This dataset provides a clean, high-quality parallel corpus essential for developing and evaluating Machine Translation (MT) systems for the Tigre language. The dataset includes parallel sentences from Tatoeba.org, contributed by members of the Tigre diaspora under the CC-BY 2.0 license. It is provided in Parquet format and contains 329,554 parallel sentences across seven target languages.
提供机构:
BeitTigreAI



