five

MMLU

收藏
魔搭社区2025-11-07 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/sbintuitions/MMLU
下载链接
链接失效反馈
官方服务:
资源简介:
評価スコアの再現性確保と SB Intuitions 修正版の公開用クローン ソース: [cais/mmlu on Hugging Face](https://huggingface.co/datasets/cais/mmlu) # Measuring Massive Multitask Language Understanding (MMLU) > This is a massive multitask test consisting of multiple-choice questions from various branches of knowledge. > The test spans subjects in the humanities, social sciences, hard sciences, and other areas that are important for some people to learn. > This covers 57 tasks including elementary mathematics, US history, computer science, law, and more. > To attain high accuracy on this test, models must possess extensive world knowledge and problem solving ability. ## Licensing Information [MIT License](https://choosealicense.com/licenses/mit/) ## Citation Information ``` @article{hendryckstest2021, title={Measuring Massive Multitask Language Understanding}, author={Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt}, journal={Proceedings of the International Conference on Learning Representations (ICLR)}, year={2021} } @article{hendrycks2021ethics, title={Aligning AI With Shared Human Values}, author={Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt}, journal={Proceedings of the International Conference on Learning Representations (ICLR)}, year={2021} } ``` # Subsets ## default - `qid` (`str`): データセット内の問題を一意識別するためのID - `subject` (`str`): 問題の[サブカテゴリ](https://github.com/hendrycks/test/blob/master/categories.py#L1)。全57種 - `tag` (`str`): 57種のサブカテゴリをまとめ上げる[カテゴリ](https://github.com/hendrycks/test/blob/master/categories.py#L61C1-L61C11)。全4種。[lm-evaluation-harness 由来の命名](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/mmlu/README.md)を使用している - `description` (`str`): `subject` ごとに設定した入力プロンプトの system description。 [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/mmlu/README.md) のものを使用している - `question` (`str`): 質問文 - `choices` (`list[str]`): 選択肢(4つ) - `answer` (`int`): choices に対応した正解選択肢のインデックス(0-3) ## wo_label_bias - subject ごとに見ても正解ラベルに偏りが出ないよう、選択肢(choices)を並び替えた版 - split: dev のみ

# 保障评估评分可复现性及SB Intuitions修订版公开克隆数据集 数据来源:[Hugging Face平台的cais/mmlu数据集](https://huggingface.co/datasets/cais/mmlu) ## 大规模多任务语言理解(MMLU,Measuring Massive Multitask Language Understanding) > 本数据集为大规模多任务评测集合,包含来自各知识领域的多项选择题。 > 评测覆盖人文科学、社会科学、自然科学及其他大众学习所需的重要领域。 > 其涵盖57项任务,包括初等数学、美国历史、计算机科学、法学等多个类别。 > 若要在该评测中取得高精度表现,模型需具备广博的世界知识与问题求解能力。 ## 许可协议信息 采用[MIT许可协议](https://choosealicense.com/licenses/mit/) ## 引用信息 @article{hendryckstest2021, title={Measuring Massive Multitask Language Understanding}, author={Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt}, journal={Proceedings of the International Conference on Learning Representations (ICLR)}, year={2021} } @article{hendrycks2021ethics, title={Aligning AI With Shared Human Values}, author={Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt}, journal={Proceedings of the International Conference on Learning Representations (ICLR)}, year={2021} } ## 数据集子集 ## 默认子集 - `qid`(字符串类型):用于唯一标识数据集中各问题的标识符 - `subject`(字符串类型):问题所属的[子类别](https://github.com/hendrycks/test/blob/master/categories.py#L1),共计57类 - `tag`(字符串类型):对57个子类别进行归纳后的[类别](https://github.com/hendrycks/test/blob/master/categories.py#L61C1-L61C11),共计4类。采用[lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/mmlu/README.md)中的命名规范 - `description`(字符串类型):针对每个`subject`设置的输入提示系统描述,采用[lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/mmlu/README.md)中的配置 - `question`(字符串类型):问题文本 - `choices`(字符串列表类型):候选选项(共4项) - `answer`(整数类型):与`choices`对应的正确选项索引(取值范围0-3) ## 无标签偏差子集 - 针对每个`subject`均未出现正解标签偏向问题的版本,即对候选选项(`choices`)进行了重排处理 - 数据划分:仅包含开发集(dev)
提供机构:
maas
创建时间:
2025-02-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作