A benchmark Arabic dataset for question classification with AAFAQ taxonomy

DataONE2025-07-17 更新2025-08-02 收录

下载链接：

https://search.dataone.org/view/sha256:bc9651e0e1e2f7d0885508f6b8e03904716828db104dee5f264302aae68b86d1

下载链接

链接失效反馈

官方服务：

资源简介：

Arabic Natural Language Processing (NLP) also suffers from the morphological complexity of the language itself, as well as limited, high-quality, annotated resources. In this work, we introduce the AAFAQ Dataset, an open-domain resource to develop semantic and cognitive question classification in Modern Standard Arabic (MSA). The dataset consists of 5,009 records annotated with rich attributes such as Question Tool, Intent, Answer Type, Cognitive Level, and Temporal Context, among others. Based on the AAFAQ Taxonomy that symbolizes the \"horizons\" of question understanding, this dataset extends the frontier of Arabic QAS to capture the semantic and contextual intricacies of Arabic questions. It has been tested for its utility by fine-tuning AraBERT on this dataset and gave very high performance in classification; integration with Alpaca + Gemma-9B Unsloth models has demonstrated enhanced metrics leveraging multi-attribute classification. This provides a comprehensive resource for Arabic ..., , , # A benchmark Arabic dataset for question classification with AAFAQ taxonomy [https://doi.org/10.5061/dryad.9w0vt4brx](https://doi.org/10.5061/dryad.9w0vt4brx) ## Description of the data and file structure The AAFAQ Dataset was collected and validated through experimental efforts focused on fine-tuning Arabic NLP models, such as AraBERT, for multi-label question classification. Additional experiments included integrating the dataset with generative answering systems like Alpaca + Gemma-9B Unsloth to enhance metrics for multi-attribute classification. The AAFAQ Dataset is a rich and comprehensive Arabic dataset designed for semantic and cognitive question classification in Modern Standard Arabic (MSA). The dataset consists of 5,009 records annotated with a variety of attributes, including Question Tool, Intent, Answer Type, Cognitive Level, and Temporal Context, among others. It serves as a benchmark resource for research in Arabic NLP, advancing fields such as education, cognitive re...,

阿拉伯语自然语言处理（Arabic Natural Language Processing, NLP）同样面临该语言自身的形态学复杂性，以及高质量标注资源匮乏的问题。本研究推出AAFAQ数据集，这是一款面向现代标准阿拉伯语（Modern Standard Arabic, MSA）语义与认知式问题分类的开放域资源。该数据集包含5009条标注记录，涵盖丰富的属性维度，例如提问工具（Question Tool）、意图（Intent）、答案类型（Answer Type）、认知水平（Cognitive Level）以及时间语境（Temporal Context）等。依托表征问题理解“视野”的AAFAQ分类体系，本数据集拓展了阿拉伯语问答系统（Question Answering System, QAS）的研究边界，以精准捕捉阿拉伯语问题的语义与语境复杂性。本研究通过在该数据集上微调AraBERT验证了其应用价值，分类任务取得了优异性能；将数据集与Alpaca + Gemma-9B Unsloth模型集成后，多属性分类的评估指标得到显著提升。该数据集为阿拉伯语……，# 基于AAFAQ分类体系的问题分类基准阿拉伯语数据集 [https://doi.org/10.5061/dryad.9w0vt4brx] ## 数据与文件结构说明 AAFAQ数据集通过针对阿拉伯语自然语言处理模型（如AraBERT）的多标签问题分类微调实验完成收集与验证。额外实验还包括将该数据集与Alpaca + Gemma-9B Unsloth等生成式问答系统集成，以优化多属性分类任务的评估指标。AAFAQ数据集是一款丰富且全面的阿拉伯语数据集，专为现代标准阿拉伯语的语义与认知式问题分类设计，包含5009条标注记录，涵盖提问工具、意图、答案类型、认知水平、时间语境等多类属性。该数据集作为阿拉伯语自然语言处理研究的基准资源，推动了教育、认知……等领域的发展。

创建时间：

2025-07-18

5,000+

优质数据集

54 个

任务类型

进入经典数据集