five

Hennara/ammlu

收藏
Hugging Face2024-03-02 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Hennara/ammlu
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - question-answering language: - ar size_categories: - 10K<n<100K --- # Dataset Card for Dataset Name Arabic MMLU: Measuring massive multitask language understanding in Arabic This dataset has been translated from the original MMLU with the help of GPT-4. The original data paper [MMLU](https://arxiv.org/pdf/2009.03300v3.pdf) The MMLU dataset on huggingface [MMLU](cais/mmlu) ### Dataset Sources [optional] The translation and re-generation has been done by AceGPT researchers [AceGPT](https://arxiv.org/abs/2309.12053) - [**Repository:**](https://github.com/FreedomIntelligence/AceGPT/tree/main/eval/benchmark_eval/benchmarks/MMLUArabic) - [**Paper**](https://arxiv.org/abs/2309.12053) ## Uses Arabic-MMLU is a comprehensive evaluation benchmark specifically designed to evaluate the knowledge and reasoning abilities of LLMs within the context of Arabic language and culture. Arabic-MMLU covers a wide range of subjects, comprising 57 topics that span from elementary to advanced professional levels. ### Direct Use This dataset is available to used directly using [datasets](https://github.com/huggingface/datasets) from huggingface, also is availabe to use with [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) framework. ## Dataset Structure The dataset consist of 57 subject, divided into 4 category. | Subject Area | STEM | Humanities | Social Sciences | Other | |---|---|---|---|---| | abstract_algebra | ✓ | | | | | anatomy | ✓ | | | | | astronomy | ✓ | | | | | business_ethics | | | | ✓ | | clinical_knowledge | | | | ✓ | | college_biology | ✓ | | | | | college_chemistry | ✓ | | | | | college_computer_science | ✓ | | | | | college_mathematics | ✓ | | | | | college_medicine | | | | ✓ | | college_physics | ✓ | | | | | computer_security | ✓ | | | | | conceptual_physics | ✓ | | | | | econometrics | | | ✓ | | | electrical_engineering | ✓ | | | | | elementary_mathematics | ✓ | | | | | formal_logic | | ✓ | | | | global_facts | | | | ✓ | | high_school_biology | ✓ | | | | | high_school_chemistry | ✓ | | | | | high_school_computer_science | ✓ | | | | | high_school_european_history | | ✓ | | | | high_school_geography | | | ✓ | | | high_school_government_and_politics | | | ✓ | | | high_school_macroeconomics | | | ✓ | | | high_school_mathematics | ✓ | | | | | high_school_microeconomics | | | ✓ | | | high_school_physics | ✓ | | | | | high_school_psychology | | | ✓ | | | high_school_statistics | ✓ | | | | | high_school_us_history | | ✓ | | | | high_school_world_history | | ✓ | | | | human_aging | | | | ✓ | | human_sexuality | | | ✓ | | | international_law | | ✓ | | | | jurisprudence | | ✓ | | | | logical_fallacies | | ✓ | | | | machine_learning | ✓ | | | | | management | | | | ✓ | | marketing | | | | ✓ | | medical_genetics | | | | ✓ | | miscellaneous | | | | ✓ | | moral_disputes | | ✓ | | | | moral_scenarios | | ✓ | | | | nutrition | | | | ✓ | | philosophy | | ✓ | | | | prehistory | | ✓ | | | | professional_accounting | | | | ✓ | | professional_law | | ✓ | | | | professional_medicine | | | | ✓ | | professional_psychology | | | ✓ | | | public_relations | | | ✓ | | | security_studies | | | ✓ | | | sociology | | | ✓ | | | us_foreign_policy | | | ✓ | | | virology | | | | ✓ | | world_religions | | ✓ | | | | - | - | - | - | - | each item of the dataset is a dictionary with **Question, A, B, C, D, Answer** where A,B,C,D are options to the choose from. here is three example from the abstract algebra subject. | Question | A | B | C | D | Answer | |---|---|---|---|---|---| | مجموعة فرعية H من مجموعة (G،*) هي مجموعة إذا | 'a، b في H => a * b في H' | 'a في H => a^-1 في H' | 'a، b في H => a * b^-1 في H' | 'H يحتوي على العنصر المحدد' | C | | 'ما هو ترتيب العنصر (4، 2) من Z_12 x Z_8' | 2 | 4 | 8 | 12 | C | |ما هو الدرجة لتمديد الحقل المعطى Q(sqrt(2) + sqrt(3)) على Q| 0 | 4 | 2 | 6| B | The size of each subject within the dataset | Subject | Test Length | Eval Length | |---|---|---| | professional_law | 1534 | 5 | | moral_scenarios | 895 | 5 | | miscellaneous | 783 | 5 | | professional_psychology | 612 | 5 | | high_school_psychology | 545 | 5 | | high_school_macroeconomics | 390 | 5 | | elementary_mathematics | 378 | 5 | | moral_disputes | 346 | 5 | | prehistory | 324 | 5 | | philosophy | 311 | 5 | | high_school_biology | 310 | 5 | | nutrition | 306 | 5 | | professional_accounting | 282 | 5 | | professional_medicine | 272 | 5 | | high_school_mathematics | 270 | 5 | | clinical_knowledge | 265 | 5 | | security_studies | 245 | 5 | | high_school_microeconomics | 238 | 5 | | high_school_world_history | 237 | 5 | | conceptual_physics | 235 | 5 | | marketing | 234 | 5 | | human_aging | 223 | 5 | | high_school_statistics | 216 | 5 | | high_school_us_history | 204 | 5 | | high_school_chemistry | 203 | 5 | | sociology | 201 | 5 | | high_school_geography | 198 | 5 | | high_school_government_and_politics | 193 | 5 | | college_medicine | 173 | 5 | | world_religions | 171 | 5 | | virology | 166 | 5 | | high_school_european_history | 165 | 5 | | logical_fallacies | 163 | 5 | | astronomy | 152 | 5 | | high_school_physics | 151 | 5 | | electrical_engineering | 145 | 5 | | college_biology | 144 | 5 | | anatomy | 135 | 5 | | human_sexuality | 131 | 5 | | formal_logic | 126 | 5 | | international_law | 121 | 5 | | econometrics | 114 | 5 | | machine_learning | 112 | 5 | | public_relations | 110 | 5 | | jurisprudence | 108 | 5 | | management | 103 | 5 | | college_physics | 102 | 5 | | abstract_algebra | 100 | 5 | | business_ethics | 100 | 5 | | college_chemistry | 100 | 5 | | college_computer_science | 100 | 5 | | college_mathematics | 100 | 5 | | computer_security | 100 | 5 | | global_facts | 100 | 5 | | high_school_computer_science | 100 | 5 | | medical_genetics | 100 | 5 | | us_foreign_policy | 100 | 5 | | count | 14042 | 285 |
提供机构:
Hennara
原始信息汇总

数据集卡片

数据集名称

Arabic MMLU: Measuring massive multitask language understanding in Arabic

数据集简介

该数据集是从原始的MMLU数据集翻译而来,借助了GPT-4进行翻译和重新生成。

数据集来源

翻译和重新生成工作由AceGPT研究人员完成。

数据集用途

Arabic-MMLU是一个全面的评估基准,专门设计用于评估阿拉伯语言和文化背景下大型语言模型(LLMs)的知识和推理能力。该数据集涵盖了从初级到高级专业水平的57个主题。

数据集结构

数据集包含57个主题,分为四个类别:STEM、人文科学、社会科学和其他。

主题分类

主题领域 STEM 人文科学 社会科学 其他
abstract_algebra
anatomy
astronomy
business_ethics
clinical_knowledge
college_biology
college_chemistry
college_computer_science
college_mathematics
college_medicine
college_physics
computer_security
conceptual_physics
econometrics
electrical_engineering
elementary_mathematics
formal_logic
global_facts
high_school_biology
high_school_chemistry
high_school_computer_science
high_school_european_history
high_school_geography
high_school_government_and_politics
high_school_macroeconomics
high_school_mathematics
high_school_microeconomics
high_school_physics
high_school_psychology
high_school_statistics
high_school_us_history
high_school_world_history
human_aging
human_sexuality
international_law
jurisprudence
logical_fallacies
machine_learning
management
marketing
medical_genetics
miscellaneous
moral_disputes
moral_scenarios
nutrition
philosophy
prehistory
professional_accounting
professional_law
professional_medicine
professional_psychology
public_relations
security_studies
sociology
us_foreign_policy
virology
world_religions

每个数据项是一个包含问题、A、B、C、D、答案的字典,其中A、B、C、D是可选答案。

示例

问题 A B C D 答案
مجموعة فرعية H من مجموعة (G،*) هي مجموعة إذا a، b في H => a * b في H a في H => a^-1 في H a، b في H => a * b^-1 في H H يحتوي على العنصر المحدد C
ما هو ترتيب العنصر (4، 2) من Z_12 x Z_8 2 4 8 12 C
ما هو الدرجة لتمديد الحقل المعطى Q(sqrt(2) + sqrt(3)) على Q 0 4 2 6 B

数据集大小

每个主题的数据集大小如下:

主题 测试长度 评估长度
professional_law 1534 5
moral_scenarios 895 5
miscellaneous 783 5
professional_psychology 612 5
high_school_psychology 545 5
high_school_macroeconomics 390 5
elementary_mathematics 378 5
moral_disputes 346 5
prehistory 324 5
philosophy 311 5
high_school_biology 310 5
nutrition 306 5
professional_accounting 282 5
professional_medicine 272 5
high_school_mathematics 270 5
clinical_knowledge 265 5
security_studies 245 5
high_school_microeconomics 238 5
high_school_world_history 237 5
conceptual_physics 235 5
marketing 234 5
human_aging 223 5
high_school_statistics 216 5
high_school_us_history 204 5
high_school_chemistry 203 5
sociology 201 5
high_school_geography 198 5
high_school_government_and_politics 193 5
college_medicine 173 5
world_religions 171 5
virology 166 5
high_school_european_history 165 5
logical_fallacies 163 5
astronomy 152 5
high_school_physics 151 5
electrical_engineering 145 5
college_biology 144 5
anatomy 135 5
human_sexuality 131 5
formal_logic 126 5
international_law 121 5
econometrics 114 5
machine_learning 112 5
public_relations 110 5
jurisprudence 108 5
management 103 5
college_physics 102 5
abstract_algebra 100 5
business_ethics 100 5
college_chemistry 100 5
college_computer_science 100 5
college_mathematics 100 5
computer_security 100 5
global_facts 100 5
high_school_computer_science 100 5
medical_genetics 100 5
us_foreign_policy 100 5
总计 14042 285
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作