Hennara/ammlu
收藏Hugging Face2024-03-02 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Hennara/ammlu
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- question-answering
language:
- ar
size_categories:
- 10K<n<100K
---
# Dataset Card for Dataset Name
Arabic MMLU: Measuring massive multitask language understanding in Arabic
This dataset has been translated from the original MMLU with the help of GPT-4.
The original data paper [MMLU](https://arxiv.org/pdf/2009.03300v3.pdf)
The MMLU dataset on huggingface [MMLU](cais/mmlu)
### Dataset Sources [optional]
The translation and re-generation has been done by AceGPT researchers [AceGPT](https://arxiv.org/abs/2309.12053)
- [**Repository:**](https://github.com/FreedomIntelligence/AceGPT/tree/main/eval/benchmark_eval/benchmarks/MMLUArabic)
- [**Paper**](https://arxiv.org/abs/2309.12053)
## Uses
Arabic-MMLU is a comprehensive evaluation benchmark specifically designed to evaluate the knowledge and reasoning abilities of LLMs within the context of Arabic language and culture.
Arabic-MMLU covers a wide range of subjects, comprising 57 topics that span from elementary to advanced professional levels.
### Direct Use
This dataset is available to used directly using [datasets](https://github.com/huggingface/datasets) from huggingface, also is availabe to use with [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) framework.
## Dataset Structure
The dataset consist of 57 subject, divided into 4 category.
| Subject Area | STEM | Humanities | Social Sciences | Other |
|---|---|---|---|---|
| abstract_algebra | ✓ | | | |
| anatomy | ✓ | | | |
| astronomy | ✓ | | | |
| business_ethics | | | | ✓ |
| clinical_knowledge | | | | ✓ |
| college_biology | ✓ | | | |
| college_chemistry | ✓ | | | |
| college_computer_science | ✓ | | | |
| college_mathematics | ✓ | | | |
| college_medicine | | | | ✓ |
| college_physics | ✓ | | | |
| computer_security | ✓ | | | |
| conceptual_physics | ✓ | | | |
| econometrics | | | ✓ | |
| electrical_engineering | ✓ | | | |
| elementary_mathematics | ✓ | | | |
| formal_logic | | ✓ | | |
| global_facts | | | | ✓ |
| high_school_biology | ✓ | | | |
| high_school_chemistry | ✓ | | | |
| high_school_computer_science | ✓ | | | |
| high_school_european_history | | ✓ | | |
| high_school_geography | | | ✓ | |
| high_school_government_and_politics | | | ✓ | |
| high_school_macroeconomics | | | ✓ | |
| high_school_mathematics | ✓ | | | |
| high_school_microeconomics | | | ✓ | |
| high_school_physics | ✓ | | | |
| high_school_psychology | | | ✓ | |
| high_school_statistics | ✓ | | | |
| high_school_us_history | | ✓ | | |
| high_school_world_history | | ✓ | | |
| human_aging | | | | ✓ |
| human_sexuality | | | ✓ | |
| international_law | | ✓ | | |
| jurisprudence | | ✓ | | |
| logical_fallacies | | ✓ | | |
| machine_learning | ✓ | | | |
| management | | | | ✓ |
| marketing | | | | ✓ |
| medical_genetics | | | | ✓ |
| miscellaneous | | | | ✓ |
| moral_disputes | | ✓ | | |
| moral_scenarios | | ✓ | | |
| nutrition | | | | ✓ |
| philosophy | | ✓ | | |
| prehistory | | ✓ | | |
| professional_accounting | | | | ✓ |
| professional_law | | ✓ | | |
| professional_medicine | | | | ✓ |
| professional_psychology | | | ✓ | |
| public_relations | | | ✓ | |
| security_studies | | | ✓ | |
| sociology | | | ✓ | |
| us_foreign_policy | | | ✓ | |
| virology | | | | ✓ |
| world_religions | | ✓ | | |
| - | - | - | - | - |
each item of the dataset is a dictionary with **Question, A, B, C, D, Answer** where A,B,C,D are options to the choose from.
here is three example from the abstract algebra subject.
| Question | A | B | C | D | Answer |
|---|---|---|---|---|---|
| مجموعة فرعية H من مجموعة (G،*) هي مجموعة إذا | 'a، b في H => a * b في H' | 'a في H => a^-1 في H' | 'a، b في H => a * b^-1 في H' | 'H يحتوي على العنصر المحدد' | C |
| 'ما هو ترتيب العنصر (4، 2) من Z_12 x Z_8' | 2 | 4 | 8 | 12 | C |
|ما هو الدرجة لتمديد الحقل المعطى Q(sqrt(2) + sqrt(3)) على Q| 0 | 4 | 2 | 6| B |
The size of each subject within the dataset
| Subject | Test Length | Eval Length |
|---|---|---|
| professional_law | 1534 | 5 |
| moral_scenarios | 895 | 5 |
| miscellaneous | 783 | 5 |
| professional_psychology | 612 | 5 |
| high_school_psychology | 545 | 5 |
| high_school_macroeconomics | 390 | 5 |
| elementary_mathematics | 378 | 5 |
| moral_disputes | 346 | 5 |
| prehistory | 324 | 5 |
| philosophy | 311 | 5 |
| high_school_biology | 310 | 5 |
| nutrition | 306 | 5 |
| professional_accounting | 282 | 5 |
| professional_medicine | 272 | 5 |
| high_school_mathematics | 270 | 5 |
| clinical_knowledge | 265 | 5 |
| security_studies | 245 | 5 |
| high_school_microeconomics | 238 | 5 |
| high_school_world_history | 237 | 5 |
| conceptual_physics | 235 | 5 |
| marketing | 234 | 5 |
| human_aging | 223 | 5 |
| high_school_statistics | 216 | 5 |
| high_school_us_history | 204 | 5 |
| high_school_chemistry | 203 | 5 |
| sociology | 201 | 5 |
| high_school_geography | 198 | 5 |
| high_school_government_and_politics | 193 | 5 |
| college_medicine | 173 | 5 |
| world_religions | 171 | 5 |
| virology | 166 | 5 |
| high_school_european_history | 165 | 5 |
| logical_fallacies | 163 | 5 |
| astronomy | 152 | 5 |
| high_school_physics | 151 | 5 |
| electrical_engineering | 145 | 5 |
| college_biology | 144 | 5 |
| anatomy | 135 | 5 |
| human_sexuality | 131 | 5 |
| formal_logic | 126 | 5 |
| international_law | 121 | 5 |
| econometrics | 114 | 5 |
| machine_learning | 112 | 5 |
| public_relations | 110 | 5 |
| jurisprudence | 108 | 5 |
| management | 103 | 5 |
| college_physics | 102 | 5 |
| abstract_algebra | 100 | 5 |
| business_ethics | 100 | 5 |
| college_chemistry | 100 | 5 |
| college_computer_science | 100 | 5 |
| college_mathematics | 100 | 5 |
| computer_security | 100 | 5 |
| global_facts | 100 | 5 |
| high_school_computer_science | 100 | 5 |
| medical_genetics | 100 | 5 |
| us_foreign_policy | 100 | 5 |
| count | 14042 | 285 |
提供机构:
Hennara
原始信息汇总
数据集卡片
数据集名称
Arabic MMLU: Measuring massive multitask language understanding in Arabic
数据集简介
该数据集是从原始的MMLU数据集翻译而来,借助了GPT-4进行翻译和重新生成。
数据集来源
翻译和重新生成工作由AceGPT研究人员完成。
数据集用途
Arabic-MMLU是一个全面的评估基准,专门设计用于评估阿拉伯语言和文化背景下大型语言模型(LLMs)的知识和推理能力。该数据集涵盖了从初级到高级专业水平的57个主题。
数据集结构
数据集包含57个主题,分为四个类别:STEM、人文科学、社会科学和其他。
主题分类
| 主题领域 | STEM | 人文科学 | 社会科学 | 其他 |
|---|---|---|---|---|
| abstract_algebra | ✓ | |||
| anatomy | ✓ | |||
| astronomy | ✓ | |||
| business_ethics | ✓ | |||
| clinical_knowledge | ✓ | |||
| college_biology | ✓ | |||
| college_chemistry | ✓ | |||
| college_computer_science | ✓ | |||
| college_mathematics | ✓ | |||
| college_medicine | ✓ | |||
| college_physics | ✓ | |||
| computer_security | ✓ | |||
| conceptual_physics | ✓ | |||
| econometrics | ✓ | |||
| electrical_engineering | ✓ | |||
| elementary_mathematics | ✓ | |||
| formal_logic | ✓ | |||
| global_facts | ✓ | |||
| high_school_biology | ✓ | |||
| high_school_chemistry | ✓ | |||
| high_school_computer_science | ✓ | |||
| high_school_european_history | ✓ | |||
| high_school_geography | ✓ | |||
| high_school_government_and_politics | ✓ | |||
| high_school_macroeconomics | ✓ | |||
| high_school_mathematics | ✓ | |||
| high_school_microeconomics | ✓ | |||
| high_school_physics | ✓ | |||
| high_school_psychology | ✓ | |||
| high_school_statistics | ✓ | |||
| high_school_us_history | ✓ | |||
| high_school_world_history | ✓ | |||
| human_aging | ✓ | |||
| human_sexuality | ✓ | |||
| international_law | ✓ | |||
| jurisprudence | ✓ | |||
| logical_fallacies | ✓ | |||
| machine_learning | ✓ | |||
| management | ✓ | |||
| marketing | ✓ | |||
| medical_genetics | ✓ | |||
| miscellaneous | ✓ | |||
| moral_disputes | ✓ | |||
| moral_scenarios | ✓ | |||
| nutrition | ✓ | |||
| philosophy | ✓ | |||
| prehistory | ✓ | |||
| professional_accounting | ✓ | |||
| professional_law | ✓ | |||
| professional_medicine | ✓ | |||
| professional_psychology | ✓ | |||
| public_relations | ✓ | |||
| security_studies | ✓ | |||
| sociology | ✓ | |||
| us_foreign_policy | ✓ | |||
| virology | ✓ | |||
| world_religions | ✓ |
每个数据项是一个包含问题、A、B、C、D、答案的字典,其中A、B、C、D是可选答案。
示例
| 问题 | A | B | C | D | 答案 |
|---|---|---|---|---|---|
| مجموعة فرعية H من مجموعة (G،*) هي مجموعة إذا | a، b في H => a * b في H | a في H => a^-1 في H | a، b في H => a * b^-1 في H | H يحتوي على العنصر المحدد | C |
| ما هو ترتيب العنصر (4، 2) من Z_12 x Z_8 | 2 | 4 | 8 | 12 | C |
| ما هو الدرجة لتمديد الحقل المعطى Q(sqrt(2) + sqrt(3)) على Q | 0 | 4 | 2 | 6 | B |
数据集大小
每个主题的数据集大小如下:
| 主题 | 测试长度 | 评估长度 |
|---|---|---|
| professional_law | 1534 | 5 |
| moral_scenarios | 895 | 5 |
| miscellaneous | 783 | 5 |
| professional_psychology | 612 | 5 |
| high_school_psychology | 545 | 5 |
| high_school_macroeconomics | 390 | 5 |
| elementary_mathematics | 378 | 5 |
| moral_disputes | 346 | 5 |
| prehistory | 324 | 5 |
| philosophy | 311 | 5 |
| high_school_biology | 310 | 5 |
| nutrition | 306 | 5 |
| professional_accounting | 282 | 5 |
| professional_medicine | 272 | 5 |
| high_school_mathematics | 270 | 5 |
| clinical_knowledge | 265 | 5 |
| security_studies | 245 | 5 |
| high_school_microeconomics | 238 | 5 |
| high_school_world_history | 237 | 5 |
| conceptual_physics | 235 | 5 |
| marketing | 234 | 5 |
| human_aging | 223 | 5 |
| high_school_statistics | 216 | 5 |
| high_school_us_history | 204 | 5 |
| high_school_chemistry | 203 | 5 |
| sociology | 201 | 5 |
| high_school_geography | 198 | 5 |
| high_school_government_and_politics | 193 | 5 |
| college_medicine | 173 | 5 |
| world_religions | 171 | 5 |
| virology | 166 | 5 |
| high_school_european_history | 165 | 5 |
| logical_fallacies | 163 | 5 |
| astronomy | 152 | 5 |
| high_school_physics | 151 | 5 |
| electrical_engineering | 145 | 5 |
| college_biology | 144 | 5 |
| anatomy | 135 | 5 |
| human_sexuality | 131 | 5 |
| formal_logic | 126 | 5 |
| international_law | 121 | 5 |
| econometrics | 114 | 5 |
| machine_learning | 112 | 5 |
| public_relations | 110 | 5 |
| jurisprudence | 108 | 5 |
| management | 103 | 5 |
| college_physics | 102 | 5 |
| abstract_algebra | 100 | 5 |
| business_ethics | 100 | 5 |
| college_chemistry | 100 | 5 |
| college_computer_science | 100 | 5 |
| college_mathematics | 100 | 5 |
| computer_security | 100 | 5 |
| global_facts | 100 | 5 |
| high_school_computer_science | 100 | 5 |
| medical_genetics | 100 | 5 |
| us_foreign_policy | 100 | 5 |
| 总计 | 14042 | 285 |



