tiiuae/evalplus-arabic
收藏Hugging Face2026-02-14 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/tiiuae/evalplus-arabic
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: humanevalplus-arabic
features:
- name: task_id
dtype: string
- name: prompt
dtype: string
- name: canonical_solution
dtype: string
- name: entry_point
dtype: string
- name: test
dtype: string
splits:
- name: test
num_bytes: 10978353
num_examples: 164
download_size: 2907286
dataset_size: 10978353
- config_name: mbppplus-arabic
features:
- name: task_id
dtype: int64
- name: code
dtype: string
- name: prompt
dtype: string
- name: source_file
dtype: string
- name: test_imports
dtype: string
- name: test_list
dtype: string
- name: test
dtype: string
splits:
- name: test
num_bytes: 4855903
num_examples: 378
download_size: 1132190
dataset_size: 4855903
configs:
- config_name: humanevalplus-arabic
data_files:
- split: test
path: humanevalplus-arabic/test-*
- config_name: mbppplus-arabic
data_files:
- split: test
path: mbppplus-arabic/test-*
---
# 3LM Code Arabic Benchmark
## Dataset Summary
This dataset includes Arabic translations of two widely-used code evaluation benchmarks — HumanEval+ and MBPP+ — adapted into Arabic for the first time as part of the 3LM project. It includes both the base and plus versions with extended unit test coverage.
## Motivation
Arabic LLMs lack meaningful benchmarks to assess code generation abilities. This dataset bridges that gap by providing high-quality Arabic natural language descriptions aligned with formal Python test cases.
## Dataset Structure
### `humanevalplus-arabic`
- `task_id`: Unique identifier (e.g., HumanEval/18)
- `prompt`: Task description in Arabic
- `entry_point`: Function name
- `canonical_solution`: Reference Python implementation
- `test`: test-cases
```json
{
"task_id": "HumanEval/3",
"prompt": "لديك قائمة من عمليات الإيداع والسحب في حساب بنكي يبدأ برصيد صفري. مهمتك هي اكتشاف إذا في أي لحظة انخفض رصيد الحساب إلى ما دون الصفر، وفي هذه اللحظة يجب أن تعيد الدالة True. وإلا فيجب أن تعيد False.",
"entry_point": "below_zero",
"canonical_solution": "...",
"test": "...",
}
```
<br>
### `mbppplus-arabic`
- `task_id`: Unique identifier (e.g., 2)
- `prompt`: Task description in Arabic
- `code`: canonical Python solution
- `source_file`: Path of the original MBPP problem file
- `test_imports`: Import statements required by the tests
- `test_list`: 3 Python `assert` statements for the task
- `test`: test cases
```json
{
"task_id": "2",
"code": "def similar_elements(test_tup1, test_tup2):\n return tuple(set(test_tup1) & set(test_tup2))"
"prompt": "اكتب دالة للعثور على العناصر المشتركة من القائمتين المعطاتين.",
"source_file": "Benchmark Questions Verification V2.ipynb",
"test_imports": "[]",
"test_list": "...",
"test": "...",
}
```
## Data Sources
- Original datasets: [MBPP+](https://huggingface.co/datasets/evalplus/mbppplus), [HumanEval+](https://huggingface.co/datasets/evalplus/humanevalplus)
- Translated with GPT-4o
- Validated via backtranslation with ROUGE-L F1 thresholds (0.8+), followed by human review
## Translation Methodology
- **Backtranslation** to ensure fidelity
- **Threshold-based filtering** and **manual review**
- **Arabic prompts only**, with code/test logic unchanged to preserve function behavior
## Code and Paper
- EvalPlus-Arabic dataset on GitHub: https://github.com/tiiuae/3LM-benchmark/frameworks/evalplus-arabic/evalplus/data/data_files
- 3LM repo on GitHub: https://github.com/tiiuae/3LM-benchmark
- 3LM paper: https://aclanthology.org/2025.arabicnlp-main.4/
## Licensing
[Falcon LLM Licence](https://falconllm.tii.ae/falcon-terms-and-conditions.html)
## Citation
```bibtex
@inproceedings{boussaha-etal-2025-3lm,
title = "3{LM}: Bridging {A}rabic, {STEM}, and Code through Benchmarking",
author = "Boussaha, Basma El Amel and
Al Qadi, Leen and
Farooq, Mugariya and
Alsuwaidi, Shaikha and
Campesan, Giulia and
Alzubaidi, Ahmed and
Alyafeai, Mohammed and
Hacid, Hakim",
booktitle = "Proceedings of The Third Arabic Natural Language Processing Conference",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.arabicnlp-main.4/",
doi = "10.18653/v1/2025.arabicnlp-main.4",
pages = "42--63",
ISBN = "979-8-89176-352-4",
}
```
数据集信息:
- 配置名称:humanevalplus-arabic
特征字段:
- 名称:task_id,数据类型:字符串(string)
- 名称:prompt,数据类型:字符串(string)
- 名称:canonical_solution,数据类型:字符串(string)
- 名称:entry_point,数据类型:字符串(string)
- 名称:test,数据类型:字符串(string)
数据拆分:
- 拆分名称:测试集(test),字节数:10978353,样本数量:164
下载大小:2907286,数据集总大小:10978353
- 配置名称:mbppplus-arabic
特征字段:
- 名称:task_id,数据类型:64位整数(int64)
- 名称:code,数据类型:字符串(string)
- 名称:prompt,数据类型:字符串(string)
- 名称:source_file,数据类型:字符串(string)
- 名称:test_imports,数据类型:字符串(string)
- 名称:test_list,数据类型:字符串(string)
- 名称:test,数据类型:字符串(string)
数据拆分:
- 拆分名称:测试集(test),字节数:4855903,样本数量:378
下载大小:1132190,数据集总大小:4855903
配置项:
- 配置名称:humanevalplus-arabic
数据文件:
- 拆分:测试集(test),路径:humanevalplus-arabic/test-*
- 配置名称:mbppplus-arabic
数据文件:
- 拆分:测试集(test),路径:mbppplus-arabic/test-*
## 3LM 阿拉伯语代码基准测试集
### 数据集概览
本数据集包含两个广泛应用的代码评估基准——HumanEval+与MBPP+——的阿拉伯语翻译版本,作为3LM项目的组成部分首次适配阿拉伯语场景。数据集涵盖基础版本与增强版本,均配备了扩展的单元测试覆盖范围。
### 构建动机
阿拉伯语大语言模型(LLM)缺乏用于评估代码生成能力的有效基准。本数据集通过提供与规范Python测试用例对齐的高质量阿拉伯语自然语言任务描述,填补了这一研究空白。
### 数据集结构
#### `humanevalplus-arabic`
- `task_id`:唯一标识符(例如:HumanEval/18)
- `prompt`:阿拉伯语任务描述
- `entry_point`:函数入口名称
- `canonical_solution`:参考Python实现代码
- `test`:测试用例
json
{
"task_id": "HumanEval/3",
"prompt": "لديك قائمة من عمليات الإيداع والسحب في حساب بنكي يبدأ برصيد صفري. مهمتك هي اكتشاف إذا في أي لحظة انخفض رصيد الحساب إلى ما دون الصفر، وفي هذه اللحظة يجب أن تعيد الدالة True. وإلا فيجب أن تعيد False.",
"entry_point": "below_zero",
"canonical_solution": "...",
"test": "...",
}
<br>
#### `mbppplus-arabic`
- `task_id`:唯一标识符(例如:2)
- `prompt`:阿拉伯语任务描述
- `code`:规范Python解决方案代码
- `source_file`:原始MBPP问题文件的路径
- `test_imports`:测试所需的导入语句
- `test_list`:用于该任务的3条Python `assert`断言语句
- `test`:完整测试用例
json
{
"task_id": "2",
"code": "def similar_elements(test_tup1, test_tup2):
return tuple(set(test_tup1) & set(test_tup2))",
"prompt": "اكتب دالة للعثور على العناصر المشتركة من القائمتين المعطاتين.",
"source_file": "Benchmark Questions Verification V2.ipynb",
"test_imports": "[]",
"test_list": "...",
"test": "...",
}
### 数据来源
- 原始数据集:[MBPP+](https://huggingface.co/datasets/evalplus/mbppplus)、[HumanEval+](https://huggingface.co/datasets/evalplus/humanevalplus)
- 翻译工具:GPT-4o
- 验证流程:先通过反向翻译结合ROUGE-L F1阈值(≥0.8)进行质量校验,随后开展人工复核
### 翻译方法
- **反向翻译**:确保翻译内容与原文语义保真
- **阈值筛选**与**人工审核**:进一步保障翻译质量
- 仅对任务提示语使用阿拉伯语,代码与测试逻辑保持不变,以确保函数行为与原始基准一致
### 代码与论文
- GitHub上的EvalPlus-Arabic数据集仓库:https://github.com/tiiuae/3LM-benchmark/frameworks/evalplus-arabic/evalplus/data/data_files
- 3LM项目主仓库:https://github.com/tiiuae/3LM-benchmark
- 3LM相关学术论文:https://aclanthology.org/2025.arabicnlp-main.4/
### 许可协议
[Falcon LLM许可协议](https://falconllm.tii.ae/falcon-terms-and-conditions.html)
### 引用格式
bibtex
@inproceedings{boussaha-etal-2025-3lm,
title = "3{LM}: Bridging {A}rabic, {STEM}, and Code through Benchmarking",
author = "Boussaha, Basma El Amel and
Al Qadi, Leen and
Farooq, Mugariya and
Alsuwaidi, Shaikha and
Campesan, Giulia and
Alzubaidi, Ahmed and
Alyafeai, Mohammed and
Hacid, Hakim",
booktitle = "Proceedings of The Third Arabic Natural Language Processing Conference",
month = nov,
year = "2025",
address = "Suzhou, China",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.arabicnlp-main.4/",
doi = "10.18653/v1/2025.arabicnlp-main.4",
pages = "42--63",
ISBN = "979-8-89176-352-4",
}
提供机构:
tiiuae



