panagoa/tatoeba_raw_spa_kbd

Name: panagoa/tatoeba_raw_spa_kbd
Creator: panagoa
Published: 2025-03-01 10:14:30
License: 暂无描述

Hugging Face2025-03-01 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/panagoa/tatoeba_raw_spa_kbd

下载链接

链接失效反馈

官方服务：

资源简介：

这个数据集包含了超过1400万对句子，源句子为西班牙语，目标翻译为卡巴尔达语。每个条目包括原始Tatoeba句子ID、语言代码、源句子、目标句子、相似度评分和使用的模型名称等元数据。这个数据集对于卡巴尔达语的机器翻译系统、低资源语言的语言学研究、跨语言NLP应用以及卡巴尔达语的保存和记录都具有很高的价值。

This dataset contains over 14 million sentence pairs, with source sentences in Spanish and their translations in Kabardian. Each entry includes metadata such as the original Tatoeba sentence ID, language codes, source sentences, target sentences, similarity scores, and the name of the model used. The dataset is valuable for Kabardian machine translation systems, linguistic research on low-resource languages, cross-lingual NLP applications, and the preservation and documentation of the Kabardian language.

提供机构：

panagoa

5,000+

优质数据集

54 个

任务类型

进入经典数据集