five

alvations/xnli-15way

收藏
Hugging Face2023-05-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/alvations/xnli-15way
下载链接
链接失效反馈
官方服务:
资源简介:
XNLI consists of 10k English sentences translated into 14 languages: ar: Arabic bg: Bulgarian de: German el: Greek es: Spanish fr: French hi: Hindi ru: Russian sw: Swahili th: Thai tr: Turkish ur: Urdu vi: Vietnamese zh: Chinese (Simplified) The XNLI 15-way parallel corpus can be used for Machine Translation as evaluation sets, in particular for low-resource languages such as Swahili or Urdu. We provide two files: xnli.15way.orig.tsv and xnli.15way.tok.tsv containing respectively the original and the tokenized version of the corpus. The files consist of 15 tab-separated columns, each corresponding to one language as indicated by the header. Please consider citing the following paper if using this dataset: @InProceedings{conneau2018xnli, author = "Conneau, Alexis and Rinott, Ruty and Lample, Guillaume and Williams, Adina and Bowman, Samuel R. and Schwenk, Holger and Stoyanov, Veselin", title = "XNLI: Evaluating Cross-lingual Sentence Representations", booktitle = "Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing", year = "2018", publisher = "Association for Computational Linguistics", location = "Brussels, Belgium", }
提供机构:
alvations
原始信息汇总

数据集概述

数据集名称

XNLI

数据集内容

  • 语言种类:包含15种语言,具体包括:

    • Arabic (ar)
    • Bulgarian (bg)
    • German (de)
    • Greek (el)
    • Spanish (es)
    • French (fr)
    • Hindi (hi)
    • Russian (ru)
    • Swahili (sw)
    • Thai (th)
    • Turkish (tr)
    • Urdu (ur)
    • Vietnamese (vi)
    • Chinese (Simplified) (zh)
  • 数据用途:适用于机器翻译评估,特别是针对低资源语言如Swahili或Urdu。

数据集文件

  • 文件1xnli.15way.orig.tsv - 包含原始语料。
  • 文件2xnli.15way.tok.tsv - 包含已分词的语料。
  • 文件格式:15列,每列对应一种语言,以制表符分隔。

引用信息

  • 论文:Conneau, Alexis et al. "XNLI: Evaluating Cross-lingual Sentence Representations". Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2018.
  • 出版信息:Brussels, Belgium.
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作