Hennara/ammlu

Name: Hennara/ammlu
Creator: Hennara
Published: 2024-03-02 17:20:25
License: 暂无描述

Hugging Face2024-03-02 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Hennara/ammlu

下载链接

链接失效反馈

官方服务：

资源简介：

--- task_categories: - question-answering language: - ar size_categories: - 10K<n<100K --- # Dataset Card for Dataset Name Arabic MMLU: Measuring massive multitask language understanding in Arabic This dataset has been translated from the original MMLU with the help of GPT-4. The original data paper [MMLU](https://arxiv.org/pdf/2009.03300v3.pdf) The MMLU dataset on huggingface [MMLU](cais/mmlu) ### Dataset Sources [optional] The translation and re-generation has been done by AceGPT researchers [AceGPT](https://arxiv.org/abs/2309.12053) - [**Repository:**](https://github.com/FreedomIntelligence/AceGPT/tree/main/eval/benchmark_eval/benchmarks/MMLUArabic) - [**Paper**](https://arxiv.org/abs/2309.12053) ## Uses Arabic-MMLU is a comprehensive evaluation benchmark specifically designed to evaluate the knowledge and reasoning abilities of LLMs within the context of Arabic language and culture. Arabic-MMLU covers a wide range of subjects, comprising 57 topics that span from elementary to advanced professional levels. ### Direct Use This dataset is available to used directly using [datasets](https://github.com/huggingface/datasets) from huggingface, also is availabe to use with [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) framework. ## Dataset Structure The dataset consist of 57 subject, divided into 4 category. | Subject Area | STEM | Humanities | Social Sciences | Other | |---|---|---|---|---| | abstract_algebra | ✓ | | | | | anatomy | ✓ | | | | | astronomy | ✓ | | | | | business_ethics | | | | ✓ | | clinical_knowledge | | | | ✓ | | college_biology | ✓ | | | | | college_chemistry | ✓ | | | | | college_computer_science | ✓ | | | | | college_mathematics | ✓ | | | | | college_medicine | | | | ✓ | | college_physics | ✓ | | | | | computer_security | ✓ | | | | | conceptual_physics | ✓ | | | | | econometrics | | | ✓ | | | electrical_engineering | ✓ | | | | | elementary_mathematics | ✓ | | | | | formal_logic | | ✓ | | | | global_facts | | | | ✓ | | high_school_biology | ✓ | | | | | high_school_chemistry | ✓ | | | | | high_school_computer_science | ✓ | | | | | high_school_european_history | | ✓ | | | | high_school_geography | | | ✓ | | | high_school_government_and_politics | | | ✓ | | | high_school_macroeconomics | | | ✓ | | | high_school_mathematics | ✓ | | | | | high_school_microeconomics | | | ✓ | | | high_school_physics | ✓ | | | | | high_school_psychology | | | ✓ | | | high_school_statistics | ✓ | | | | | high_school_us_history | | ✓ | | | | high_school_world_history | | ✓ | | | | human_aging | | | | ✓ | | human_sexuality | | | ✓ | | | international_law | | ✓ | | | | jurisprudence | | ✓ | | | | logical_fallacies | | ✓ | | | | machine_learning | ✓ | | | | | management | | | | ✓ | | marketing | | | | ✓ | | medical_genetics | | | | ✓ | | miscellaneous | | | | ✓ | | moral_disputes | | ✓ | | | | moral_scenarios | | ✓ | | | | nutrition | | | | ✓ | | philosophy | | ✓ | | | | prehistory | | ✓ | | | | professional_accounting | | | | ✓ | | professional_law | | ✓ | | | | professional_medicine | | | | ✓ | | professional_psychology | | | ✓ | | | public_relations | | | ✓ | | | security_studies | | | ✓ | | | sociology | | | ✓ | | | us_foreign_policy | | | ✓ | | | virology | | | | ✓ | | world_religions | | ✓ | | | | - | - | - | - | - | each item of the dataset is a dictionary with **Question, A, B, C, D, Answer** where A,B,C,D are options to the choose from. here is three example from the abstract algebra subject. | Question | A | B | C | D | Answer | |---|---|---|---|---|---| | مجموعة فرعية H من مجموعة (G،*) هي مجموعة إذا | 'a، b في H => a * b في H' | 'a في H => a^-1 في H' | 'a، b في H => a * b^-1 في H' | 'H يحتوي على العنصر المحدد' | C | | 'ما هو ترتيب العنصر (4، 2) من Z_12 x Z_8' | 2 | 4 | 8 | 12 | C | |ما هو الدرجة لتمديد الحقل المعطى Q(sqrt(2) + sqrt(3)) على Q| 0 | 4 | 2 | 6| B | The size of each subject within the dataset | Subject | Test Length | Eval Length | |---|---|---| | professional_law | 1534 | 5 | | moral_scenarios | 895 | 5 | | miscellaneous | 783 | 5 | | professional_psychology | 612 | 5 | | high_school_psychology | 545 | 5 | | high_school_macroeconomics | 390 | 5 | | elementary_mathematics | 378 | 5 | | moral_disputes | 346 | 5 | | prehistory | 324 | 5 | | philosophy | 311 | 5 | | high_school_biology | 310 | 5 | | nutrition | 306 | 5 | | professional_accounting | 282 | 5 | | professional_medicine | 272 | 5 | | high_school_mathematics | 270 | 5 | | clinical_knowledge | 265 | 5 | | security_studies | 245 | 5 | | high_school_microeconomics | 238 | 5 | | high_school_world_history | 237 | 5 | | conceptual_physics | 235 | 5 | | marketing | 234 | 5 | | human_aging | 223 | 5 | | high_school_statistics | 216 | 5 | | high_school_us_history | 204 | 5 | | high_school_chemistry | 203 | 5 | | sociology | 201 | 5 | | high_school_geography | 198 | 5 | | high_school_government_and_politics | 193 | 5 | | college_medicine | 173 | 5 | | world_religions | 171 | 5 | | virology | 166 | 5 | | high_school_european_history | 165 | 5 | | logical_fallacies | 163 | 5 | | astronomy | 152 | 5 | | high_school_physics | 151 | 5 | | electrical_engineering | 145 | 5 | | college_biology | 144 | 5 | | anatomy | 135 | 5 | | human_sexuality | 131 | 5 | | formal_logic | 126 | 5 | | international_law | 121 | 5 | | econometrics | 114 | 5 | | machine_learning | 112 | 5 | | public_relations | 110 | 5 | | jurisprudence | 108 | 5 | | management | 103 | 5 | | college_physics | 102 | 5 | | abstract_algebra | 100 | 5 | | business_ethics | 100 | 5 | | college_chemistry | 100 | 5 | | college_computer_science | 100 | 5 | | college_mathematics | 100 | 5 | | computer_security | 100 | 5 | | global_facts | 100 | 5 | | high_school_computer_science | 100 | 5 | | medical_genetics | 100 | 5 | | us_foreign_policy | 100 | 5 | | count | 14042 | 285 |

提供机构：

Hennara

原始信息汇总

数据集卡片

数据集名称

Arabic MMLU: Measuring massive multitask language understanding in Arabic

数据集简介

该数据集是从原始的MMLU数据集翻译而来，借助了GPT-4进行翻译和重新生成。

数据集来源

翻译和重新生成工作由AceGPT研究人员完成。

数据集用途

Arabic-MMLU是一个全面的评估基准，专门设计用于评估阿拉伯语言和文化背景下大型语言模型（LLMs）的知识和推理能力。该数据集涵盖了从初级到高级专业水平的57个主题。

数据集结构

数据集包含57个主题，分为四个类别：STEM、人文科学、社会科学和其他。

主题分类

主题领域	STEM	人文科学	社会科学	其他
abstract_algebra	✓
anatomy	✓
astronomy	✓
business_ethics				✓
clinical_knowledge				✓
college_biology	✓
college_chemistry	✓
college_computer_science	✓
college_mathematics	✓
college_medicine				✓
college_physics	✓
computer_security	✓
conceptual_physics	✓
econometrics			✓
electrical_engineering	✓
elementary_mathematics	✓
formal_logic		✓
global_facts				✓
high_school_biology	✓
high_school_chemistry	✓
high_school_computer_science	✓
high_school_european_history		✓
high_school_geography			✓
high_school_government_and_politics			✓
high_school_macroeconomics			✓
high_school_mathematics	✓
high_school_microeconomics			✓
high_school_physics	✓
high_school_psychology			✓
high_school_statistics	✓
high_school_us_history		✓
high_school_world_history		✓
human_aging				✓
human_sexuality			✓
international_law		✓
jurisprudence		✓
logical_fallacies		✓
machine_learning	✓
management				✓
marketing				✓
medical_genetics				✓
miscellaneous				✓
moral_disputes		✓
moral_scenarios		✓
nutrition				✓
philosophy		✓
prehistory		✓
professional_accounting				✓
professional_law		✓
professional_medicine				✓
professional_psychology			✓
public_relations			✓
security_studies			✓
sociology			✓
us_foreign_policy			✓
virology				✓
world_religions		✓

每个数据项是一个包含问题、A、B、C、D、答案的字典，其中A、B、C、D是可选答案。

示例

问题	A	B	C	D	答案
مجموعة فرعية H من مجموعة (G،*) هي مجموعة إذا	a، b في H => a * b في H	a في H => a^-1 في H	a، b في H => a * b^-1 في H	H يحتوي على العنصر المحدد	C
ما هو ترتيب العنصر (4، 2) من Z_12 x Z_8	2	4	8	12	C
ما هو الدرجة لتمديد الحقل المعطى Q(sqrt(2) + sqrt(3)) على Q	0	4	2	6	B

数据集大小

每个主题的数据集大小如下：

主题	测试长度	评估长度
professional_law	1534	5
moral_scenarios	895	5
miscellaneous	783	5
professional_psychology	612	5
high_school_psychology	545	5
high_school_macroeconomics	390	5
elementary_mathematics	378	5
moral_disputes	346	5
prehistory	324	5
philosophy	311	5
high_school_biology	310	5
nutrition	306	5
professional_accounting	282	5
professional_medicine	272	5
high_school_mathematics	270	5
clinical_knowledge	265	5
security_studies	245	5
high_school_microeconomics	238	5
high_school_world_history	237	5
conceptual_physics	235	5
marketing	234	5
human_aging	223	5
high_school_statistics	216	5
high_school_us_history	204	5
high_school_chemistry	203	5
sociology	201	5
high_school_geography	198	5
high_school_government_and_politics	193	5
college_medicine	173	5
world_religions	171	5
virology	166	5
high_school_european_history	165	5
logical_fallacies	163	5
astronomy	152	5
high_school_physics	151	5
electrical_engineering	145	5
college_biology	144	5
anatomy	135	5
human_sexuality	131	5
formal_logic	126	5
international_law	121	5
econometrics	114	5
machine_learning	112	5
public_relations	110	5
jurisprudence	108	5
management	103	5
college_physics	102	5
abstract_algebra	100	5
business_ethics	100	5
college_chemistry	100	5
college_computer_science	100	5
college_mathematics	100	5
computer_security	100	5
global_facts	100	5
high_school_computer_science	100	5
medical_genetics	100	5
us_foreign_policy	100	5
总计	14042	285

5,000+

优质数据集

54 个

任务类型

进入经典数据集