abdoelsayed/ArabicaQA
收藏Hugging Face2024-03-27 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/abdoelsayed/ArabicaQA
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- crowdsourced
language_creators:
- crowdsourced
- found
license: mit
task_categories:
- question-answering
language:
- ar
pretty_name: ArabicaQA
size_categories:
- 10K<n<100K
---
# ArabicaQA
ArabicaQA: Comprehensive Dataset for Arabic Question Answering
This repository contains dataset for paper *ArabicaQA: Comprehensive Dataset for Arabic Question Answering*. Below, we provide details regarding the materials available in this repository:
## Dataset
Within this folder, you will find the training, validation, and test sets of the ArabicaQA dataset. Refer to the table below for the dataset statistics:
| | Training | Validation | Test |
| -------------------|----------|------------|--------|
| MRC (with answers) | 62,186 | 13,483 | 13,426 |
| MRC (unanswerable) | 2,596 | 561 | 544 |
| Open-Domain | 62,057 | 13,475 | 13,414 |
| Open-Domain | 58,528 | 12,541 | 12,541 |
## Citation
If you find these codes or data useful, please consider citing our paper as:
```
@misc{abdallah2024arabicaqa,
title={ArabicaQA: A Comprehensive Dataset for Arabic Question Answering},
author={Abdelrahman Abdallah and Mahmoud Kasem and Mahmoud Abdalla and Mohamed Mahmoud and Mohamed Elkasaby and Yasser Elbendary and Adam Jatowt},
year={2024},
eprint={2403.17848},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
ArabicaQA is a comprehensive dataset for Arabic question answering, containing training, validation, and test sets. The dataset includes machine reading comprehension (MRC) with answers and unanswerable questions, in Arabic language, created through crowdsourcing, under the MIT license, with a size between 10K and 100K.
提供机构:
abdoelsayed
原始信息汇总
ArabicaQA
数据集概述
ArabicaQA 是一个针对阿拉伯语问答的综合数据集。
数据集详情
- 数据集名称: ArabicaQA
- 数据集类型: 问答数据集
- 语言: 阿拉伯语
- 数据集大小: 10K<n<100K
- 许可证: MIT
- 任务类别: 问答
- 数据来源: 众包和收集
数据集统计
| 训练集 | 验证集 | 测试集 | |
|---|---|---|---|
| MRC (有答案) | 62,186 | 13,483 | 13,426 |
| MRC (无法回答) | 2,596 | 561 | 544 |
| 开放领域 | 62,057 | 13,475 | 13,414 |
| 开放领域 | 58,528 | 12,541 | 12,541 |
引用
如果使用该数据集,请引用以下论文:
@misc{abdallah2024arabicaqa, title={ArabicaQA: A Comprehensive Dataset for Arabic Question Answering}, author={Abdelrahman Abdallah and Mahmoud Kasem and Mahmoud Abdalla and Mohamed Mahmoud and Mohamed Elkasaby and Yasser Elbendary and Adam Jatowt}, year={2024}, eprint={2403.17848}, archivePrefix={arXiv}, primaryClass={cs.CL} }



