gmnlp/tico19
收藏Hugging Face2021-10-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/gmnlp/tico19
下载链接
链接失效反馈官方服务:
资源简介:
The TICO-19 evaluation set provides:
* Predefined dev and test splits. We provide English-XX translation files under both the `dev` and `test` directories.
* The dev set includes 971 sentences, and the test set includes 2100 sentences.
* The corresponding IDs are listed in the `dev.ids` and `test.ids` files.
The format of the files is:
~~~
{sourceLang}\t{targetLang}\t{sourceString}\t{targetString}\t{stringID}\t{sourceURL}\t{license}\t{translator_ID}
~~~
Currently available languages:
* Amharic (am)
* Arabic (ar)
* Bengali (bn)
* Kurdish Sorani (ckb)
* Latin American Spanish (es-LA)
* Farsi (fa)
* French (fr)
* Nigerian Fulfulde (fuv)
* Hausa (ha)
* Hindi (hi)
* Indonesian (id)
* Kurdish Kurmanji (ku)
* Lingala (ln)
* Luganda (lg)
* Marathi (mr)
* Malay (ms)
* Muanmar (my)
* Nepali (ne)
* Oromo (om)
* Dari (prs)
* Pashto (ps)
* Brazilian Portuguese (pt-BR)
* Russian (ru)
* Kinyarwanda (rw)
* Somali (so)
* kiSwahili (sw)
* Ethiopian Tigrinya (ti)
* Tagalog (tl)
* Urdu (ur)
* Chinese (Simplified) (zh)
* Zulu (zu)
All translations are released under a CC-0 license.
提供机构:
gmnlp
原始信息汇总
TICO-19 评估集概述
数据集结构
- 开发集(dev):包含971个句子。
- 测试集(test):包含2100个句子。
- ID文件:
dev.ids和test.ids文件列出了相应的ID。
文件格式
每个文件包含以下字段:
{sourceLang} {targetLang} {sourceString} {targetString} {stringID} {sourceURL} {license} {translator_ID}
支持的语言
- Amharic (am)
- Arabic (ar)
- Bengali (bn)
- Kurdish Sorani (ckb)
- Latin American Spanish (es-LA)
- Farsi (fa)
- French (fr)
- Nigerian Fulfulde (fuv)
- Hausa (ha)
- Hindi (hi)
- Indonesian (id)
- Kurdish Kurmanji (ku)
- Lingala (ln)
- Luganda (lg)
- Marathi (mr)
- Malay (ms)
- Muanmar (my)
- Nepali (ne)
- Oromo (om)
- Dari (prs)
- Pashto (ps)
- Brazilian Portuguese (pt-BR)
- Russian (ru)
- Kinyarwanda (rw)
- Somali (so)
- kiSwahili (sw)
- Ethiopian Tigrinya (ti)
- Tagalog (tl)
- Urdu (ur)
- Chinese (Simplified) (zh)
- Zulu (zu)
许可证
所有翻译均以CC-0许可证发布。



