alvations/xnli-15way
收藏Hugging Face2023-05-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/alvations/xnli-15way
下载链接
链接失效反馈官方服务:
资源简介:
XNLI consists of 10k English sentences translated into 14 languages:
ar: Arabic
bg: Bulgarian
de: German
el: Greek
es: Spanish
fr: French
hi: Hindi
ru: Russian
sw: Swahili
th: Thai
tr: Turkish
ur: Urdu
vi: Vietnamese
zh: Chinese (Simplified)
The XNLI 15-way parallel corpus can be used for Machine Translation as evaluation sets, in particular for low-resource languages such as Swahili or Urdu.
We provide two files: xnli.15way.orig.tsv and xnli.15way.tok.tsv containing respectively the original and the tokenized version of the corpus.
The files consist of 15 tab-separated columns, each corresponding to one language as indicated by the header.
Please consider citing the following paper if using this dataset:
@InProceedings{conneau2018xnli,
author = "Conneau, Alexis
and Rinott, Ruty
and Lample, Guillaume
and Williams, Adina
and Bowman, Samuel R.
and Schwenk, Holger
and Stoyanov, Veselin",
title = "XNLI: Evaluating Cross-lingual Sentence Representations",
booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing",
year = "2018",
publisher = "Association for Computational Linguistics",
location = "Brussels, Belgium",
}
提供机构:
alvations
原始信息汇总
数据集概述
数据集名称
XNLI
数据集内容
-
语言种类:包含15种语言,具体包括:
- Arabic (ar)
- Bulgarian (bg)
- German (de)
- Greek (el)
- Spanish (es)
- French (fr)
- Hindi (hi)
- Russian (ru)
- Swahili (sw)
- Thai (th)
- Turkish (tr)
- Urdu (ur)
- Vietnamese (vi)
- Chinese (Simplified) (zh)
-
数据用途:适用于机器翻译评估,特别是针对低资源语言如Swahili或Urdu。
数据集文件
- 文件1:
xnli.15way.orig.tsv- 包含原始语料。 - 文件2:
xnli.15way.tok.tsv- 包含已分词的语料。 - 文件格式:15列,每列对应一种语言,以制表符分隔。
引用信息
- 论文:Conneau, Alexis et al. "XNLI: Evaluating Cross-lingual Sentence Representations". Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018.
- 出版信息:Brussels, Belgium.



