tarteel-ai/quranqa
收藏Hugging Face2024-09-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tarteel-ai/quranqa
下载链接
链接失效反馈官方服务:
资源简介:
QRCD(古兰经阅读理解数据集)由1,093个问题-段落对组成,这些对与提取的答案结合形成了1,337个问题-段落-答案三元组。数据集支持的任务是问答系统,使用部分互逆排名(pRR)作为评估指标。数据集的语言是古兰经阿拉伯语,数据集的创建和注释过程由专家生成。数据集的结构包括数据实例、数据字段和数据分割。数据集的许可证是CC-BY-ND 4.0。
QRCD (Quranic Reading Comprehension Dataset) comprises 1,093 question-passage pairs, which are paired with extracted answers to generate 1,337 question-passage-answer triplets. The task supported by this dataset is question answering, with partial reciprocal rank (pRR) adopted as the evaluation metric. The dataset is in Quranic Arabic. Its creation and annotation processes were conducted by domain experts. The dataset structure encompasses data instances, data fields and data splits. The dataset is licensed under CC-BY-ND 4.0.
提供机构:
tarteel-ai
原始信息汇总
数据集概述
数据集名称
- 名称: Quranic Reading Comprehension Dataset (QRCD)
数据集基本信息
- 语言: 古兰经阿拉伯语
- 许可证: CC-BY-ND 4.0
- 多语言性: 单语种
- 数据集大小: 小于1K至10K之间
- 数据来源: 原始数据
- 标签: 古兰经, 问答
- 任务类别: 问答
- 任务ID: 抽取式问答
数据集内容
- 组成: 包含1,093个问题-段落对,共构成1,337个问题-段落-答案三元组。
- 数据实例结构: 每个实例包含一个段落、一个问题及一个可能包含一个或多个答案的列表。
数据集结构
- 数据字段:
pq_id: 样本IDpassage: 上下文文本surah: 章节号verses: 诗句范围question: 问题文本answers: 答案列表及其起始字符位置
- 数据分割:
- 训练集: 65%, 710对
- 开发集: 10%, 109对
- 测试集: 25%, 274对
- 总计: 100%, 1,093对
评估指标
- 官方评估指标: 部分逆秩(pRR)
- 其他评估指标: 精确匹配(EM)和F1@1
数据集创建
-
许可证: CC-BY-ND 4.0
-
引用信息:
@article{malhas2020ayatec, author = {Malhas, Rana and Elsayed, Tamer}, title = {AyaTEC: Building a Reusable Verse-Based Test Collection for Arabic Question Answering on the Holy Qur’an}, year = {2020}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, volume = {19}, number = {6}, issn = {2375-4699}, url = {https://doi.org/10.1145/3400396}, doi = {10.1145/3400396}, journal = {ACM Trans. Asian Low-Resour. Lang. Inf. Process.}, month = {oct}, articleno = {78}, numpages = {21}, keywords = {evaluation, Classical Arabic} }
贡献者
- 贡献者: @piraka9011



