five

neulab/PangeaBench-tydiqa

收藏
Hugging Face2024-11-01 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/neulab/PangeaBench-tydiqa
下载链接
链接失效反馈
官方服务:
资源简介:
TyDi QA是一个涵盖11种类型多样语言的问答数据集,包含204K个问答对。这些语言在类型学上具有多样性,即每种语言表达的语言特征集不同,因此我们期望在此数据集上表现良好的模型能够泛化到世界上的大量语言。它包含了在仅英语语料库中找不到的语言现象。为了提供一个真实的信息检索任务并避免启动效应,问题由想要知道答案但尚不知道答案的人编写(与SQuAD及其后代不同),并且数据是直接在每种语言中收集的,没有使用翻译(与MLQA和XQuAD不同)。

TyDi QA is a question answering dataset covering 11 typologically diverse languages with 204K question-answer pairs. The languages of TyDi QA are diverse with regard to their typology -- the set of linguistic features that each language expresses -- such that we expect models performing well on this set to generalize across a large number of the languages in the world. It contains language phenomena that would not be found in English-only corpora. To provide a realistic information-seeking task and avoid priming effects, questions are written by people who want to know the answer, but don’t know the answer yet, (unlike SQuAD and its descendents) and the data is collected directly in each language without the use of translation (unlike MLQA and XQuAD).
提供机构:
neulab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作