ArabicMMLU

Name: ArabicMMLU
Creator: maas
Published: 2026-01-02 16:26:27
License: 暂无描述

魔搭社区2026-01-02 更新2025-03-22 收录

下载链接：

https://modelscope.cn/datasets/MBZUAI/ArabicMMLU

下载链接

链接失效反馈

官方服务：

资源简介：

<img src="https://raw.githubusercontent.com/fajri91/eval_picts/master/ArabicMMLU-Bar.png" style="width: 100%;" id="title-icon"> Fajri Koto, Haonan Li, Sara Shatnawi, Jad Doughman, Abdelrahman Boda Sadallah, Aisha Alraeesi, Khalid Almubarak, Zaid Alyafeai, Neha Sengupta, Shady Shehata, Nizar Habash, Preslav Nakov, and Timothy Baldwin <h4 align="left"> MBZUAI, Prince Sattam bin Abdulaziz University, KFUPM, Core42, NYU Abu Dhabi, The University of Melbourne </h4> --- ## Introduction We present ArabicMMLU, the first multi-task language understanding benchmark for Arabic language, sourced from school exams across diverse educational levels in different countries spanning North Africa, the Levant, and the Gulf regions. Our data comprises 40 tasks and 14,575 multiple-choice questions in Modern Standard Arabic (MSA), and is carefully constructed by collaborating with native speakers in the region. <img src="https://github.com/fajri91/eval_picts/blob/master/ArabicMMLU-circle.png?raw=true" style="width: 40%;" id="title-icon"> ## Data Each question in the dataset is a multiple-choice question with up to 5 choices and only one choice as the correct answer. ``` import datasets data = datasets.load_dataset('MBZUAI/ArabicMMLU') ``` ## Statistics The data construction process involved a total of 10 Arabic native speakers from different countries: 6 internal workers (1 Jordanian, 1 Egyptian, 1 Lebanese, 1 from UAE, and 2 from KSA) and 4 external workers (3 Jordanian and 1 Egyptian). The resulting corpus is sourced from the eight countries, with Jordan, Egypt, and Palestine being the top three sources. We categorize the collected questions into different subject areas, including: (1) STEM (Science, Technology, Engineering, and Mathematics); (2) Social Science; (3) Humanities; (4) Arabic Language; and (5) Others. <img src="https://github.com/fajri91/eval_picts/blob/master/ArabicMMLU-country.png?raw=true" style="width: 40%;" id="title-icon"> ## Examples These questions are written in Arabic. <img src="https://github.com/fajri91/eval_picts/blob/master/ArabicMMLU-ex2.png?raw=true" style="width: 40%;" id="title-icon"> ## Evaluation We evaluate 22 open-source multilingual models, 11 open-source Arabic-centric models, and 2 closed-source models. We experimented with different prompts in Arabic and English, and found the English prompt is the best. Below is the examples of input with the prompt. <img src="https://github.com/fajri91/eval_picts/blob/master/ArabicMMLU-prompt.png?raw=true" style="width: 35%;" id="title-icon"> #### Zero-shot Evaluation <img src="https://github.com/fajri91/eval_picts/blob/master/ArabicMMLU-result.png?raw=true" style="width: 70%;" id="title-icon"> #### Few-shot Evaluation <img src="https://github.com/fajri91/eval_picts/blob/master/ArabicMMLU-fewshot.png?raw=true" style="width: 35%;" id="title-icon"> ## Citation Please find out paper 📄<a href="https://aclanthology.org/2024.findings-acl.334/" target="_blank" style="margin-right: 15px; margin-left: 10px">here.</a> ``` @inproceedings{koto2024arabicmmlu, title={ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic}, author={"Fajri Koto and Haonan Li and Sara Shatanawi and Jad Doughman and Abdelrahman Boda Sadallah and Aisha Alraeesi and Khalid Almubarak and Zaid Alyafeai and Neha Sengupta and Shady Shehata and Nizar Habash and Preslav Nakov and Timothy Baldwin"}, booktitle={Findings of the Association for Computational Linguistics: ACL 2024}, year={2024} } ```

<img src="https://raw.githubusercontent.com/fajri91/eval_picts/master/ArabicMMLU-Bar.png" style="width: 100%;" id="title-icon"> 法吉里·科托、李浩楠、萨拉·沙特纳维、贾德·杜曼、阿卜杜勒拉赫曼·博达·萨达拉、艾莎·阿尔雷西、哈立德·阿尔穆巴拉克、扎伊德·阿尔亚费伊、内哈·森古普塔、沙迪·谢哈塔、尼扎尔·哈巴什、普雷斯拉夫·纳科夫与蒂莫西·鲍德温 <h4 align="left"> 穆罕默德·本·扎耶德人工智能大学（MBZUAI）、萨塔姆·本·阿卜杜勒阿齐兹亲王大学、法赫德国王石油与矿产大学（KFUPM）、Core42、纽约大学阿布扎比分校、墨尔本大学 </h4> --- ## 引言我们提出阿拉伯MMLU（ArabicMMLU），这是首个面向阿拉伯语的多任务语言理解基准数据集，其数据来源于北非、黎凡特及海湾地区不同国家各教育阶段的学校考试。本数据集包含40项任务与14575道现代标准阿拉伯语（Modern Standard Arabic, MSA）单项选择题，由该地区的母语使用者协同精心构建。 <img src="https://github.com/fajri91/eval_picts/blob/master/ArabicMMLU-circle.png?raw=true" style="width: 40%;" id="title-icon"> ## 数据数据集中的每道题目均为单项选择题，最多包含5个选项，且仅有一个正确答案。 import datasets data = datasets.load_dataset('MBZUAI/ArabicMMLU') ## 数据集统计本次数据构建过程共招募了来自不同国家的10名阿拉伯语母语使用者：6名内部工作人员（1名约旦人、1名埃及人、1名黎巴嫩人、1名阿联酋人以及2名沙特阿拉伯人）与4名外部工作人员（3名约旦人、1名埃及人）。最终的语料库来自8个国家，其中约旦、埃及与巴勒斯坦是数据的三大主要来源。我们将收集到的题目划分为五大学科领域，分别为：(1) 理工科（STEM，即科学、技术、工程与数学）；(2) 社会科学；(3) 人文科学；(4) 阿拉伯语语言学；(5) 其他类别。 <img src="https://github.com/fajri91/eval_picts/blob/master/ArabicMMLU-country.png?raw=true" style="width: 40%;" id="title-icon"> ## 示例以下示例题目均以阿拉伯语撰写。 <img src="https://github.com/fajri91/eval_picts/blob/master/ArabicMMLU-ex2.png?raw=true" style="width: 40%;" id="title-icon"> ## 评估我们对22个开源多语言模型、11个开源阿拉伯语聚焦模型以及2个闭源模型开展了评估实验。我们尝试了阿拉伯语与英语两种不同的提示词（prompt），并发现英语提示词的效果最优。以下为带提示词的输入示例。 <img src="https://github.com/fajri91/eval_picts/blob/master/ArabicMMLU-prompt.png?raw=true" style="width: 35%;" id="title-icon"> #### 零样本（Zero-shot）评估 <img src="https://github.com/fajri91/eval_picts/blob/master/ArabicMMLU-result.png?raw=true" style="width: 70%;" id="title-icon"> #### 少样本（Few-shot）评估 <img src="https://github.com/fajri91/eval_picts/blob/master/ArabicMMLU-fewshot.png?raw=true" style="width: 35%;" id="title-icon"> ## 引用请通过<a href="https://aclanthology.org/2024.findings-acl.334/" target="_blank" style="margin-right: 15px; margin-left: 10px">此处</a>获取论文📄。 @inproceedings{koto2024arabicmmlu, title={阿拉伯MMLU：评估阿拉伯语下的大规模多任务语言理解能力}, author={"Fajri Koto and Haonan Li and Sara Shatanawi and Jad Doughman and Abdelrahman Boda Sadallah and Aisha Alraeesi and Khalid Almubarak and Zaid Alyafeai and Neha Sengupta and Shady Shehata and Nizar Habash and Preslav Nakov and Timothy Baldwin"}, booktitle={Findings of the Association for Computational Linguistics: ACL 2024}, year={2024} }

提供机构：

maas

创建时间：

2025-03-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集