ymoslem/Tatoeba-Translations

Name: ymoslem/Tatoeba-Translations
Creator: ymoslem
Published: 2024-12-29 13:38:30
License: 暂无描述

Hugging Face2024-12-29 更新2024-12-21 收录

下载链接：

https://hf-mirror.com/datasets/ymoslem/Tatoeba-Translations

下载链接

链接失效反馈

官方服务：

资源简介：

这是截至2024年12月的最新版Tatoeba翻译数据集。句子从Tatoeba收集网站下载，并通过映射`sentences.tar.bz2`使用`sentences_base.tar.bz2`找到源句子（sentence_src）和目标句子（sentence_tgt）。数据集包含8,547,819个独特的翻译对，涵盖414种语言和约5,917种语言对。数据集的特性包括id_src、lang_src、sentence_src、id_tgt、lang_tgt、sentence_tgt和lang_pair等字段。数据集的大小为1,144,194,352字节，包含8,547,819个示例。数据集的许可证为cc-by-2.0，任务类别为翻译，规模类别为1M<n<10M。

This is the latest version of Tatoeba translations as of December 2024. The sentences are downloaded from the Tatoeba collection website and processed through mapping `sentences.tar.bz2` using `sentences_base.tar.bz2` to find source (`sentence_src`) and target (`sentence_tgt`) sentences. The dataset includes 8,547,819 unique translation pairs in 414 languages, covering ~5,917 language pairs. The features of the dataset include id_src, lang_src, sentence_src, id_tgt, lang_tgt, sentence_tgt, and lang_pair. The dataset size is 1,144,194,352 bytes with 8,547,819 examples. The dataset is licensed under cc-by-2.0, with task categories as translation and size categories as 1M<n<10M.

提供机构：

ymoslem

5,000+

优质数据集

54 个

任务类型

进入经典数据集