five

gmnlp/tico19

收藏
Hugging Face2021-10-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/gmnlp/tico19
下载链接
链接失效反馈
官方服务:
资源简介:
The TICO-19 evaluation set provides: * Predefined dev and test splits. We provide English-XX translation files under both the `dev` and `test` directories. * The dev set includes 971 sentences, and the test set includes 2100 sentences. * The corresponding IDs are listed in the `dev.ids` and `test.ids` files. The format of the files is: ~~~ {sourceLang}\t{targetLang}\t{sourceString}\t{targetString}\t{stringID}\t{sourceURL}\t{license}\t{translator_ID} ~~~ Currently available languages: * Amharic (am) * Arabic (ar) * Bengali (bn) * Kurdish Sorani (ckb) * Latin American Spanish (es-LA) * Farsi (fa) * French (fr) * Nigerian Fulfulde (fuv) * Hausa (ha) * Hindi (hi) * Indonesian (id) * Kurdish Kurmanji (ku) * Lingala (ln) * Luganda (lg) * Marathi (mr) * Malay (ms) * Muanmar (my) * Nepali (ne) * Oromo (om) * Dari (prs) * Pashto (ps) * Brazilian Portuguese (pt-BR) * Russian (ru) * Kinyarwanda (rw) * Somali (so) * kiSwahili (sw) * Ethiopian Tigrinya (ti) * Tagalog (tl) * Urdu (ur) * Chinese (Simplified) (zh) * Zulu (zu) All translations are released under a CC-0 license.
提供机构:
gmnlp
原始信息汇总

TICO-19 评估集概述

数据集结构

  • 开发集(dev):包含971个句子。
  • 测试集(test):包含2100个句子。
  • ID文件dev.idstest.ids 文件列出了相应的ID。

文件格式

每个文件包含以下字段:

{sourceLang} {targetLang} {sourceString} {targetString} {stringID} {sourceURL} {license} {translator_ID}

支持的语言

  • Amharic (am)
  • Arabic (ar)
  • Bengali (bn)
  • Kurdish Sorani (ckb)
  • Latin American Spanish (es-LA)
  • Farsi (fa)
  • French (fr)
  • Nigerian Fulfulde (fuv)
  • Hausa (ha)
  • Hindi (hi)
  • Indonesian (id)
  • Kurdish Kurmanji (ku)
  • Lingala (ln)
  • Luganda (lg)
  • Marathi (mr)
  • Malay (ms)
  • Muanmar (my)
  • Nepali (ne)
  • Oromo (om)
  • Dari (prs)
  • Pashto (ps)
  • Brazilian Portuguese (pt-BR)
  • Russian (ru)
  • Kinyarwanda (rw)
  • Somali (so)
  • kiSwahili (sw)
  • Ethiopian Tigrinya (ti)
  • Tagalog (tl)
  • Urdu (ur)
  • Chinese (Simplified) (zh)
  • Zulu (zu)

许可证

所有翻译均以CC-0许可证发布。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作