five

AraDiCE

收藏
魔搭社区2025-12-05 更新2025-06-21 收录
下载链接:
https://modelscope.cn/datasets/QCRI/AraDiCE
下载链接
链接失效反馈
官方服务:
资源简介:
# AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs ## Overview The **AraDiCE** dataset is designed to evaluate dialectal and cultural capabilities in large language models (LLMs). The dataset consists of post-edited versions of various benchmark datasets, curated for validation in cultural and dialectal contexts relevant to Arabic. As part of the supplemental materials, we have selected a few datasets (see below) for the reader to review. We will make the full AraDiCE benchmarking suite publicly available to the community. ## AraDICE Collection AraDICE collection can accessed through [collection page](https://huggingface.co/collections/QCRI/aradice-6727765839bf89aa78e9f132) Individual dataset can also be accessed by the following links: - [ArabicMMLU-lev](https://huggingface.co/datasets/QCRI/AraDICE-ArabicMMLU-lev) - [ArabicMMLU-egy](https://huggingface.co/datasets/QCRI/AraDICE-ArabicMMLU-egy) - [AraDiCE-Culture](https://huggingface.co/datasets/QCRI/AraDiCE-Culture) - [BoolQ](https://huggingface.co/datasets/QCRI/AraDiCE-BoolQ) - [OpenBookQA (OBQA)](https://huggingface.co/datasets/QCRI/AraDiCE-OpenBookQA) - [PIQA](https://huggingface.co/datasets/QCRI/AraDiCE-PIQA) - [TruthfulQA](https://huggingface.co/datasets/QCRI/AraDiCE-TruthfulQA) - [AraDiCE-WinoGrande](https://huggingface.co/datasets/QCRI/AraDiCE-WinoGrande) ## Machine Translation (MT) Models Used for AraDICE Along with AraDICE Collection, we provide Machine Translation (MT) models tailored for specific Arabic dialects. These models are designed to facilitate seamless translation from Modern Standard Arabic (MSA) into two prominent Arabic dialects: Levantine and Egyptian. The models leverage state-of-the-art neural translation methods to ensure high accuracy and contextual relevance. You can access and download the MT models using the following links: - **MSA to Levantine Dialect Model:** [AraDiCE-msa-to-lev](https://huggingface.co/QCRI/AraDiCE-msa-to-lev) This model translates text from MSA into the Levantine Arabic dialect, commonly spoken in countries like Lebanon, Syria, Jordan, and Palestine. - **MSA to Egyptian Dialect Model:** [AraDiCE-msa-to-egy](https://huggingface.co/QCRI/AraDiCE-msa-to-egy) This model enables translation from MSA into Egyptian Arabic, widely spoken and understood across Egypt and in other Arabic-speaking regions due to its cultural prominence. These models are hosted on Hugging Face for easy accessibility and integration into various applications. ## Dataset Statistics The datasets used in this study include: *i)* four existing Arabic datasets for understanding and generation: *Arabic Dialects Dataset (ADD)*, *ADI*, *QADI*, along with a dialectal response generation dataset, and *MADAR*; *ii)* seven datasets translated and post-edited into MSA and dialects (Levantine and Egyptian), which include *ArabicMMLU*, *BoolQ*, *PIQA*, *OBQA*, *Winogrande*, *Belebele*, and *TruthfulQA*; and *iii)* *AraDiCE-Culture*, an in-house developed regional Arabic cultural understanding dataset. Please find below the types of dataset and their statistics benchmarked in **AraDiCE**. <p align="left"> <img src="./benchmarking_tasks_datasets.png" style="width: 40%;" id="title-icon"> </p> <p align="left"> <img src="./data_stat_table.png" style="width: 40%;" id="title-icon"> </p> ## Dataset Usage The AraDiCE dataset is intended to be used for benchmarking and evaluating large language models, specifically focusing on: - Assessing the performance of LLMs on Arabic-specific dialect and cultural specifics. - Dialectal variations in the Arabic language. - Cultural context awareness in reasoning. ## Evaluation We have used [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) eval framework to for the benchmarking. It is under a [pull](https://github.com/EleutherAI/lm-evaluation-harness/pull/2507) request on *lm-evaluation-harness* at this moment. <!-- We will soon release them. Stay tuned!! --> ## License The dataset is distributed under the **Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)**. The full license text can be found in the accompanying `licenses_by-nc-sa_4.0_legalcode.txt` file. ## Citation Please find the paper <a href="https://arxiv.org/pdf/2409.11404" target="_blank" style="margin-right: 15px; margin-left: 10px">here</a>, which is accepted at [COLING 2025](https://coling2025.org/). If you use all or any specific dataset in this collection, please make sure if also cite original dataset paper. You will find the citations in our paper. ``` @article{mousi2024aradicebenchmarksdialectalcultural, title={{AraDiCE}: Benchmarks for Dialectal and Cultural Capabilities in LLMs}, author={Basel Mousi and Nadir Durrani and Fatema Ahmad and Md. Arid Hasan and Maram Hasanain and Tameem Kabbani and Fahim Dalvi and Shammur Absar Chowdhury and Firoj Alam}, year={2024}, publisher={arXiv:2409.11404}, url={https://arxiv.org/abs/2409.11404}, } ```

# AraDiCE:面向大语言模型(Large Language Model, LLM)方言与文化能力基准数据集 ## 概述 **AraDiCE** 数据集旨在评估大语言模型的方言与文化能力。该数据集由经过后编辑的各类基准数据集版本组成,专为与阿拉伯语相关的文化与方言场景验证而整理。 作为补充材料的一部分,我们精选了部分数据集(详见下文)供读者查阅。完整的AraDiCE基准测试套件将向社区公开发布。 ## AraDICE 合集 可通过[数据集合集页面](https://huggingface.co/collections/QCRI/aradice-6727765839bf89aa78e9f132)访问AraDICE合集。 单个数据集也可通过以下链接获取: - [阿拉伯语MMLU-黎凡特方言版(ArabicMMLU-lev)](https://huggingface.co/datasets/QCRI/AraDICE-ArabicMMLU-lev) - [阿拉伯语MMLU-埃及方言版(ArabicMMLU-egy)](https://huggingface.co/datasets/QCRI/AraDICE-ArabicMMLU-egy) - [AraDiCE-文化(AraDiCE-Culture)](https://huggingface.co/datasets/QCRI/AraDiCE-Culture) - [布尔问答(BoolQ)](https://huggingface.co/datasets/QCRI/AraDiCE-BoolQ) - [开放书籍问答(OpenBookQA, OBQA)](https://huggingface.co/datasets/QCRI/AraDiCE-OpenBookQA) - [物理交互问答(PIQA)](https://huggingface.co/datasets/QCRI/AraDiCE-PIQA) - [诚实问答(TruthfulQA)](https://huggingface.co/datasets/QCRI/AraDiCE-TruthfulQA) - [威诺格兰德数据集(AraDiCE-WinoGrande)](https://huggingface.co/datasets/QCRI/AraDiCE-WinoGrande) ## 用于AraDICE的机器翻译模型 除AraDICE合集外,我们还提供针对特定阿拉伯语方言定制的机器翻译(Machine Translation, MT)模型。这些模型旨在实现从现代标准阿拉伯语(Modern Standard Arabic, MSA)到两种主流阿拉伯语方言的流畅翻译:黎凡特方言与埃及方言。模型采用当前最先进的神经机器翻译技术,以确保高准确率与上下文适配性。 您可通过以下链接获取并下载这些机器翻译模型: - **现代标准阿拉伯语转黎凡特方言模型(AraDiCE-msa-to-lev)**:[链接](https://huggingface.co/QCRI/AraDiCE-msa-to-lev)。该模型可将文本从现代标准阿拉伯语翻译为黎凡特阿拉伯语方言,该方言广泛使用于黎巴嫩、叙利亚、约旦及巴勒斯坦等国家和地区。 - **现代标准阿拉伯语转埃及方言模型(AraDiCE-msa-to-egy)**:[链接](https://huggingface.co/QCRI/AraDiCE-msa-to-egy)。该模型可实现现代标准阿拉伯语到埃及阿拉伯语的翻译,埃及阿拉伯语因文化影响力广泛,在埃及全境及其他阿拉伯语地区均被广泛使用与理解。 这些模型托管于Hugging Face平台,便于开发者快速获取并集成至各类应用中。 ## 数据集统计信息 本研究使用的数据集包括:*i)* 4个已有的阿拉伯语理解与生成数据集:阿拉伯语方言数据集(Arabic Dialects Dataset, ADD)、ADI、QADI,以及1个方言回复生成数据集与MADAR;*ii)* 7个经翻译与后编辑处理的、适配现代标准阿拉伯语及两种方言(黎凡特方言与埃及方言)的数据集,包括ArabicMMLU、BoolQ、PIQA、OBQA、Winogrande、Belebele与TruthfulQA;*iii)* 自研的区域阿拉伯语文化理解数据集*AraDiCE-Culture*。下文将列出**AraDiCE**基准测试中涉及的数据集类型与统计信息。 <p align="left"> <img src="./benchmarking_tasks_datasets.png" style="width: 40%;" id="title-icon"> </p> <p align="left"> <img src="./data_stat_table.png" style="width: 40%;" id="title-icon"> </p> ## 数据集使用场景 AraDiCE数据集旨在用于大语言模型的基准测试与评估,具体聚焦于以下方向: - 评估大语言模型在阿拉伯语专属方言与文化细节上的性能表现。 - 阿拉伯语的方言变体识别与处理能力。 - 推理过程中的文化语境感知能力。 ## 评估流程 我们采用[lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)评估框架进行基准测试。目前该框架的适配版本正处于*lm-evaluation-harness*的[拉取请求(Pull Request)](https://github.com/EleutherAI/lm-evaluation-harness/pull/2507)流程中。 <!-- 相关内容即将发布,敬请期待!! --> ## 许可协议 本数据集采用**知识共享署名-非商业性使用-相同方式共享4.0国际许可协议(Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, CC BY-NC-SA 4.0)**进行分发。完整的许可协议文本可在附带的`licenses_by-nc-sa_4.0_legalcode.txt`文件中查看。 ## 引用方式 本文的相关论文可点击<a href="https://arxiv.org/pdf/2409.11404" target="_blank" style="margin-right: 15px; margin-left: 10px">此处</a>查看,该论文已被[COLING 2025](https://coling2025.org/)接收。若您使用本合集中的全部或部分数据集,请同时引用各数据集的原始论文,相关引用信息可在我们的论文中找到。 @article{mousi2024aradicebenchmarksdialectalcultural, title={{AraDiCE}: Benchmarks for Dialectal and Cultural Capabilities in LLMs}, author={Basel Mousi and Nadir Durrani and Fatema Ahmad and Md. Arid Hasan and Maram Hasanain and Tameem Kabbani and Fahim Dalvi and Shammur Absar Chowdhury and Firoj Alam}, year={2024}, publisher={arXiv:2409.11404}, url={https://arxiv.org/abs/2409.11404}, }
提供机构:
maas
创建时间:
2025-06-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作