qcri/wiki_qa_ar

Name: qcri/wiki_qa_ar
Creator: qcri
Published: 2024-01-18 11:18:07
License: 暂无描述

Hugging Face2024-01-18 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/qcri/wiki_qa_ar

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是阿拉伯语版本的WikiQA数据集，主要用于开放域问答任务。数据集通过自动机器翻译生成，并通过众包方式选择最佳翻译结果。每个数据点包含问题、答案以及答案是否正确（标签为1或0）等信息。数据集未进行分割，但提供了训练、验证和测试集的具体数据量。

This is a monolingual Arabic question-answering dataset created through machine translation and crowdsourcing for open-domain question-answering tasks. The dataset contains questions and answers, along with a label indicating whether the answer is correct or not. The dataset is divided into training, validation, and test sets.

提供机构：

qcri

原始信息汇总

数据集概述

基本信息

数据集名称: English-Arabic Wikipedia Question-Answering
语言: 阿拉伯语
许可: 未知
多语言性: 单语种
数据集大小: 100K<n<1M
源数据集: 原始数据
任务类别: 问答
任务ID: 开放领域问答
Papers with Code ID: wikiqaar

数据集结构

特征

question_id: 问题ID，类型为字符串。
question: 问题文本，类型为字符串。
document_id: 维基百科文档ID，类型为字符串。
answer_id: 答案ID，类型为字符串。
answer: 候选答案，类型为字符串。
label: 标签，1表示答案正确，0表示答案不正确，类型为类别标签。

数据分割

训练集: 70,264个样本
验证集: 10,387个样本
测试集: 20,632个样本

数据集创建

数据来源

数据集是通过自动机器翻译和众包选择最佳翻译版本创建的。

注释

数据集不包含额外的注释。

使用注意事项

数据集的许可信息未知。
数据集的创建和注释过程未详细说明。

引用信息

@InProceedings{YangYihMeek:EMNLP2015:WikiQA, author = {{Yi}, Yang and {Wen-tau}, Yih and {Christopher} Meek}, title = "{WikiQA: A Challenge Dataset for Open-Domain Question Answering}", journal = {Association for Computational Linguistics}, year = 2015, doi = {10.18653/v1/D15-1237}, pages = {2013–2018}, }

5,000+

优质数据集

54 个

任务类型

进入经典数据集