AraDICE-ArabicMMLU-egy
收藏魔搭社区2025-11-28 更新2025-06-21 收录
下载链接:
https://modelscope.cn/datasets/QCRI/AraDICE-ArabicMMLU-egy
下载链接
链接失效反馈官方服务:
资源简介:
# AraDiCE: Benchmarks for Dialectal and Cultural Capabilities in LLMs -- ArabicMMLU - Egyptian dialect
## Overview
The **AraDiCE** dataset is crafted to assess the dialectal and cultural understanding of large language models (LLMs) within Arabic-speaking contexts. It includes post-edited adaptations of several benchmark datasets, specifically curated to validate LLM performance in culturally and dialectally relevant scenarios for Arabic.
Within the AraDiCE collection, this particular subset is designated as **ArabicMMLU - Egyptian Dialect**.
## Dataset Usage
The AraDiCE dataset is intended to be used for benchmarking and evaluating large language models, specifically focusing on:
- Assessing the performance of LLMs on Arabic-specific dialect and cultural specifics.
- Dialectal variations in the Arabic language.
- Cultural context awareness in reasoning.
## Evaluation
We have used [lm-harness](https://github.com/EleutherAI/lm-evaluation-harness) eval framework to for the benchmarking. We will soon release them. Stay tuned!!
## Machine Translation Models
We will soon be releasing all our *machine translation models*. Stay tuned! For early access, feel free to contact us.
## License
The dataset is distributed under the **Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0)**. The full license text can be found in the accompanying `licenses_by-nc-sa_4.0_legalcode.txt` file.
## Citation
Please find the paper <a href="https://arxiv.org/pdf/2409.11404" target="_blank" style="margin-right: 15px; margin-left: 10px">here.</a>
```
@article{mousi2024aradicebenchmarksdialectalcultural,
title={{AraDiCE}: Benchmarks for Dialectal and Cultural Capabilities in LLMs},
author={Basel Mousi and Nadir Durrani and Fatema Ahmad and Md. Arid Hasan and Maram Hasanain and Tameem Kabbani and Fahim Dalvi and Shammur Absar Chowdhury and Firoj Alam},
year={2024},
publisher={arXiv:2409.11404},
url={https://arxiv.org/abs/2409.11404},
}
```
# AraDiCE:面向大语言模型(Large Language Model)方言与文化能力的基准测试集——ArabicMMLU-埃及方言子集
## 概述
**AraDiCE**数据集旨在评估大语言模型在阿拉伯语语境下的方言与文化理解能力。该数据集包含经过后期编辑适配的多款基准数据集,专为验证大语言模型在阿拉伯语文化与方言相关场景中的表现而精心打造。
在AraDiCE数据集集合中,本次介绍的子集为**ArabicMMLU-埃及方言子集**。
## 数据集用途
本数据集可用于大语言模型的基准测试与性能评估,具体聚焦以下方向:
- 评估大语言模型在阿拉伯语专属方言与文化细节场景下的表现;
- 分析阿拉伯语的方言变体;
- 测试模型在推理过程中的文化语境感知能力。
## 评估方式
本次基准测试采用了[lm-harness](https://github.com/EleutherAI/lm-evaluation-harness)评估框架。相关测试结果即将发布,敬请期待!
## 机器翻译模型
我们还将在近期发布全部机器翻译模型,敬请关注。若需提前获取权限,可随时与我们联系。
## 许可协议
本数据集采用**知识共享署名-非商业性使用-相同方式共享4.0国际许可协议(CC BY-NC-SA 4.0)**进行分发,完整许可协议文本可在随附的`licenses_by-nc-sa_4.0_legalcode.txt`文件中查看。
## 引用
相关论文请点击<a href="https://arxiv.org/pdf/2409.11404" target="_blank" style="margin-right: 15px; margin-left: 10px">此处</a>查阅。
@article{mousi2024aradicebenchmarksdialectalcultural,
title={{AraDiCE}: Benchmarks for Dialectal and Cultural Capabilities in LLMs},
author={Basel Mousi and Nadir Durrani and Fatema Ahmad and Md. Arid Hasan and Maram Hasanain and Tameem Kabbani and Fahim Dalvi and Shammur Absar Chowdhury and Firoj Alam},
year={2024},
publisher={arXiv:2409.11404},
url={https://arxiv.org/abs/2409.11404},
}
提供机构:
maas
创建时间:
2025-06-17



