five

HAERAE-HUB/KMMLU-HARD

收藏
Hugging Face2024-03-09 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/HAERAE-HUB/KMMLU-HARD
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: maritime_engineering data_files: - split: dev path: data/maritime_engineering-dev.csv - split: test path: data/maritime_engineering-hard-test.csv - config_name: materials_engineering data_files: - split: dev path: data/materials_engineering-dev.csv - split: test path: data/materials_engineering-hard-test.csv - config_name: railway_and_automotive_engineering data_files: - split: dev path: data/railway_and_automotive_engineering-dev.csv - split: test path: data/railway_and_automotive_engineering-hard-test.csv - config_name: biology data_files: - split: dev path: data/biology-dev.csv - split: test path: data/biology-hard-test.csv - config_name: public_safety data_files: - split: dev path: data/public_safety-dev.csv - split: test path: data/public_safety-hard-test.csv - config_name: criminal_law data_files: - split: dev path: data/criminal_law-dev.csv - split: test path: data/criminal_law-hard-test.csv - config_name: information_technology data_files: - split: dev path: data/information_technology-dev.csv - split: test path: data/information_technology-hard-test.csv - config_name: geomatics data_files: - split: dev path: data/geomatics-dev.csv - split: test path: data/geomatics-hard-test.csv - config_name: management data_files: - split: dev path: data/management-dev.csv - split: test path: data/management-hard-test.csv - config_name: math data_files: - split: dev path: data/math-dev.csv - split: test path: data/math-hard-test.csv - config_name: accounting data_files: - split: dev path: data/accounting-dev.csv - split: test path: data/accounting-hard-test.csv - config_name: chemistry data_files: - split: dev path: data/chemistry-dev.csv - split: test path: data/chemistry-hard-test.csv - config_name: nondestructive_testing data_files: - split: dev path: data/nondestructive_testing-dev.csv - split: test path: data/nondestructive_testing-hard-test.csv - config_name: computer_science data_files: - split: dev path: data/computer_science-dev.csv - split: test path: data/computer_science-hard-test.csv - config_name: ecology data_files: - split: dev path: data/ecology-dev.csv - split: test path: data/ecology-hard-test.csv - config_name: health data_files: - split: dev path: data/health-dev.csv - split: test path: data/health-hard-test.csv - config_name: political_science_and_sociology data_files: - split: dev path: data/political_science_and_sociology-dev.csv - split: test path: data/political_science_and_sociology-hard-test.csv - config_name: patent data_files: - split: dev path: data/patent-dev.csv - split: test path: data/patent-hard-test.csv - config_name: electrical_engineering data_files: - split: dev path: data/electrical_engineering-dev.csv - split: test path: data/electrical_engineering-hard-test.csv - config_name: electronics_engineering data_files: - split: dev path: data/electronics_engineering-dev.csv - split: test path: data/electronics_engineering-hard-test.csv - config_name: korean_history data_files: - split: dev path: data/korean_history-dev.csv - split: test path: data/korean_history-hard-test.csv - config_name: gas_technology_and_engineering data_files: - split: dev path: data/gas_technology_and_engineering-dev.csv - split: test path: data/gas_technology_and_engineering-hard-test.csv - config_name: machine_design_and_manufacturing data_files: - split: dev path: data/machine_design_and_manufacturing-dev.csv - split: test path: data/machine_design_and_manufacturing-hard-test.csv - config_name: chemical_engineering data_files: - split: dev path: data/chemical_engineering-dev.csv - split: test path: data/chemical_engineering-hard-test.csv - config_name: telecommunications_and_wireless_technology data_files: - split: dev path: data/telecommunications_and_wireless_technology-dev.csv - split: test path: data/telecommunications_and_wireless_technology-hard-test.csv - config_name: food_processing data_files: - split: dev path: data/food_processing-dev.csv - split: test path: data/food_processing-hard-test.csv - config_name: social_welfare data_files: - split: dev path: data/social_welfare-dev.csv - split: test path: data/social_welfare-hard-test.csv - config_name: real_estate data_files: - split: dev path: data/real_estate-dev.csv - split: test path: data/real_estate-hard-test.csv - config_name: marketing data_files: - split: dev path: data/marketing-dev.csv - split: test path: data/marketing-hard-test.csv - config_name: mechanical_engineering data_files: - split: dev path: data/mechanical_engineering-dev.csv - split: test path: data/mechanical_engineering-hard-test.csv - config_name: fashion data_files: - split: dev path: data/fashion-dev.csv - split: test path: data/fashion-hard-test.csv - config_name: psychology data_files: - split: dev path: data/psychology-dev.csv - split: test path: data/psychology-hard-test.csv - config_name: taxation data_files: - split: dev path: data/taxation-dev.csv - split: test path: data/taxation-hard-test.csv - config_name: environmental_science data_files: - split: dev path: data/environmental_science-dev.csv - split: test path: data/environmental_science-hard-test.csv - config_name: refrigerating_machinery data_files: - split: dev path: data/refrigerating_machinery-dev.csv - split: test path: data/refrigerating_machinery-hard-test.csv - config_name: education data_files: - split: dev path: data/education-dev.csv - split: test path: data/education-hard-test.csv - config_name: industrial_engineer data_files: - split: dev path: data/industrial_engineer-dev.csv - split: test path: data/industrial_engineer-hard-test.csv - config_name: civil_engineering data_files: - split: dev path: data/civil_engineering-dev.csv - split: test path: data/civil_engineering-hard-test.csv - config_name: energy_management data_files: - split: dev path: data/energy_management-dev.csv - split: test path: data/energy_management-hard-test.csv - config_name: law data_files: - split: dev path: data/law-dev.csv - split: test path: data/law-hard-test.csv - config_name: agricultural_sciences data_files: - split: dev path: data/agricultural_sciences-dev.csv - split: test path: data/agricultural_sciences-hard-test.csv - config_name: interior_architecture_and_design data_files: - split: dev path: data/interior_architecture_and_design-dev.csv - split: test path: data/interior_architecture_and_design-hard-test.csv - config_name: aviation_engineering_and_maintenance data_files: - split: dev path: data/aviation_engineering_and_maintenance-dev.csv - split: test path: data/aviation_engineering_and_maintenance-hard-test.csv - config_name: construction data_files: - split: dev path: data/construction-dev.csv - split: test path: data/construction-hard-test.csv - config_name: economics data_files: - split: dev path: data/economics-dev.csv - split: test path: data/economics-hard-test.csv license: cc-by-nd-4.0 task_categories: - question-answering language: - ko tags: - haerae - mmlu size_categories: - 100K<n<1M --- ### KMMLU (Korean-MMLU) We propose KMMLU, a new Korean benchmark with 35,030 expert-level multiple-choice questions across 45 subjects ranging from humanities to STEM. Unlike previous Korean benchmarks that are translated from existing English benchmarks, KMMLU is collected from original Korean exams, capturing linguistic and cultural aspects of the Korean language. We test 26 publically available and proprietary LLMs, identifying significant room for improvement. The best publicly available model achieves 50.54% on KMMLU, far below the average human performance of 62.6%. This model was primarily trained for English and Chinese, not Korean. Current LLMs tailored to Korean, such as Polyglot-Ko, perform far worse. Surprisingly, even the most capable proprietary LLMs, e.g., GPT-4 and HyperCLOVA X, achieve 59.95% and 53.40%, respectively. This suggests that further work is needed to improve Korean LLMs, and KMMLU offers the right tool to track this progress. We make our dataset publicly available on the Hugging Face Hub and integrate the benchmark into EleutherAI's Language Model Evaluation Harness. Link to Paper: [KMMLU: Measuring Massive Multitask Language Understanding in Korean](https://arxiv.org/abs/2402.11548) ### KMMLU Statistics | Category | # Questions | |------------------------------|-------------| | **Prerequisites** | | | None | 59,909 | | 1 Prerequisite Test | 12,316 | | 2 Prerequisite Tests | 776 | | 2+ Years of Experience | 65,135 | | 4+ Years of Experience | 98,678 | | 9+ Years of Experience | 6,963 | | **Question Type** | | | Positive | 207,030 | | Negation | 36,777 | | **Split** | | | Train | 208,522 | | Validation | 225 | | Test | 35,030 | | **Total** | 243,777 | ### Categories To reimplement the categories in the paper, refer to the following: ``` supercategories = { "accounting": "HUMSS", "agricultural_sciences": "Other", "aviation_engineering_and_maintenance": "Applied Science", "biology": "STEM", "chemical_engineering": "STEM", "chemistry": "STEM", "civil_engineering": "STEM", "computer_science": "STEM", "construction": "Other", "criminal_law": "HUMSS", "ecology": "STEM", "economics": "HUMSS", "education": "HUMSS", "electrical_engineering": "STEM", "electronics_engineering": "Applied Science", "energy_management": "Applied Science", "environmental_science": "Applied Science", "fashion": "Other", "food_processing": "Other", "gas_technology_and_engineering": "Applied Science", "geomatics": "Applied Science", "health": "Other", "industrial_engineer": "Applied Science", "information_technology": "STEM", "interior_architecture_and_design": "Other", "law": "HUMSS", "machine_design_and_manufacturing": "Applied Science", "management": "HUMSS", "maritime_engineering": "Applied Science", "marketing": "Other", "materials_engineering": "STEM", "mechanical_engineering": "STEM", "nondestructive_testing": "Applied Science", "patent": "Other", "political_science_and_sociology": "HUMSS", "psychology": "HUMSS", "public_safety": "Other", "railway_and_automotive_engineering": "Applied Science", "real_estate": "Other", "refrigerating_machinery": "Other", "social_welfare": "HUMSS", "taxation": "HUMSS", "telecommunications_and_wireless_technology": "Applied Science", "korean_history": "HUMSS", "math": "STEM" } ``` ### Point of Contact For any questions contact us via the following email:) ``` spthsrbwls123@yonsei.ac.kr ```
提供机构:
HAERAE-HUB
原始信息汇总

数据集概述

数据集名称

KMMLU (Korean-MMLU)

数据集描述

KMMLU是一个包含35,030个专家级别的多项选择题的韩国基准测试,涵盖从人文到STEM的45个学科。与之前的韩国基准测试不同,KMMLU是从原始的韩国考试中收集的,捕捉了韩语的语言和文化特点。

数据集配置

数据集包含以下配置:

  • maritime_engineering
  • materials_engineering
  • railway_and_automotive_engineering
  • biology
  • public_safety
  • criminal_law
  • information_technology
  • geomatics
  • management
  • math
  • accounting
  • chemistry
  • nondestructive_testing
  • computer_science
  • ecology
  • health
  • political_science_and_sociology
  • patent
  • electrical_engineering
  • electronics_engineering
  • korean_history
  • gas_technology_and_engineering
  • machine_design_and_manufacturing
  • chemical_engineering
  • telecommunications_and_wireless_technology
  • food_processing
  • social_welfare
  • real_estate
  • marketing
  • mechanical_engineering
  • fashion
  • psychology
  • taxation
  • environmental_science
  • refrigerating_machinery
  • education
  • industrial_engineer
  • civil_engineering
  • energy_management
  • law
  • agricultural_sciences
  • interior_architecture_and_design
  • aviation_engineering_and_maintenance
  • construction
  • economics

数据文件

每个配置包含两个数据文件:

  • dev 文件:用于开发集
  • hard-test 文件:用于测试集

数据集统计

类别 问题数量
前置条件
59,909
1个前置测试 12,316
2个前置测试 776
2年以上经验 65,135
4年以上经验 98,678
9年以上经验 6,963
问题类型
正面 207,030
否定 36,777
分割
训练 208,522
验证 225
测试 35,030
总计 243,777

类别分类

数据集的类别分为以下超类别:

  • HUMSS
  • STEM
  • Applied Science
  • Other

许可证

cc-by-nd-4.0

任务类别

  • 问答

语言

  • 韩语

标签

  • haerae
  • mmlu

数据集大小

  • 100K<n<1M
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作