five

abdoelsayed/ArabicaQA

收藏
Hugging Face2024-03-27 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/abdoelsayed/ArabicaQA
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - crowdsourced language_creators: - crowdsourced - found license: mit task_categories: - question-answering language: - ar pretty_name: ArabicaQA size_categories: - 10K<n<100K --- # ArabicaQA ArabicaQA: Comprehensive Dataset for Arabic Question Answering This repository contains dataset for paper *ArabicaQA: Comprehensive Dataset for Arabic Question Answering*. Below, we provide details regarding the materials available in this repository: ## Dataset Within this folder, you will find the training, validation, and test sets of the ArabicaQA dataset. Refer to the table below for the dataset statistics: | | Training | Validation | Test | | -------------------|----------|------------|--------| | MRC (with answers) | 62,186 | 13,483 | 13,426 | | MRC (unanswerable) | 2,596 | 561 | 544 | | Open-Domain | 62,057 | 13,475 | 13,414 | | Open-Domain | 58,528 | 12,541 | 12,541 | ## Citation If you find these codes or data useful, please consider citing our paper as: ``` @misc{abdallah2024arabicaqa, title={ArabicaQA: A Comprehensive Dataset for Arabic Question Answering}, author={Abdelrahman Abdallah and Mahmoud Kasem and Mahmoud Abdalla and Mohamed Mahmoud and Mohamed Elkasaby and Yasser Elbendary and Adam Jatowt}, year={2024}, eprint={2403.17848}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

ArabicaQA is a comprehensive dataset for Arabic question answering, containing training, validation, and test sets. The dataset includes machine reading comprehension (MRC) with answers and unanswerable questions, in Arabic language, created through crowdsourcing, under the MIT license, with a size between 10K and 100K.
提供机构:
abdoelsayed
原始信息汇总

ArabicaQA

数据集概述

ArabicaQA 是一个针对阿拉伯语问答的综合数据集。

数据集详情

  • 数据集名称: ArabicaQA
  • 数据集类型: 问答数据集
  • 语言: 阿拉伯语
  • 数据集大小: 10K<n<100K
  • 许可证: MIT
  • 任务类别: 问答
  • 数据来源: 众包和收集

数据集统计

训练集 验证集 测试集
MRC (有答案) 62,186 13,483 13,426
MRC (无法回答) 2,596 561 544
开放领域 62,057 13,475 13,414
开放领域 58,528 12,541 12,541

引用

如果使用该数据集,请引用以下论文:

@misc{abdallah2024arabicaqa, title={ArabicaQA: A Comprehensive Dataset for Arabic Question Answering}, author={Abdelrahman Abdallah and Mahmoud Kasem and Mahmoud Abdalla and Mohamed Mahmoud and Mohamed Elkasaby and Yasser Elbendary and Adam Jatowt}, year={2024}, eprint={2403.17848}, archivePrefix={arXiv}, primaryClass={cs.CL} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作