M3KE(Massive Multi-Level Multi-Subject Knowledge Evaluation )
收藏OpenDataLab2026-05-24 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/M3KE
下载链接
链接失效反馈官方服务:
资源简介:
M3KE,一种大规模的多层次多学科知识评估基准,它是为测量中文大型语言模型在零和少镜头设置下获得的知识而开发的。我们从71个任务中收集了20,477个问题。我们的选择涵盖了从小学到大学的中国教育体系的所有主要层次,以及各种学科,包括人文,历史,政治,法律,教育,心理,科学,技术,艺术和宗教。所有问题都是选择题,有四个选项,因此保证了标准化和统一的评估过程。
我们已经评估并将继续评估我们基准上的许多中文大型语言模型。当前评估的模型要么仅对海量数据进行预训练,要么使用SFT或RLHF进行预训练微调。模型大小从335M到175B参数不等。
M3KE is a large-scale, multi-level and multi-disciplinary knowledge evaluation benchmark developed to measure the knowledge acquired by Chinese large language models (LLMs) under zero-shot and few-shot settings. We collected 20,477 questions from 71 tasks. Our selection covers all major levels of the Chinese education system from primary school to university, as well as a wide range of disciplines including humanities, history, politics, law, education, psychology, science, technology, art, and religion. All questions are multiple-choice items with four options, thus ensuring a standardized and unified evaluation process.
We have evaluated and will continue to evaluate numerous Chinese LLMs on this benchmark. The currently evaluated models are either solely pre-trained on massive datasets, or pre-trained and fine-tuned using SFT or RLHF. The model sizes range from 335M to 175B parameters.
提供机构:
OpenDataLab
创建时间:
2023-09-04
搜集汇总
数据集介绍

背景与挑战
背景概述
M3KE是一个大规模多层次多学科知识评估基准,专为评估中文大型语言模型在零和少镜头设置下的知识而设计。它包含20,477个选择题,覆盖从小学到大学的教育层次及多个学科,以实现标准化评估。
以上内容由遇见数据集搜集并总结生成



