sam-paech/mmlu-pro-irt-1-0
收藏MMLU-Pro-IRT 数据集概述
数据集信息
特征
- question_id: 问题ID,数据类型为
int64 - question: 问题内容,数据类型为
string - options: 选项,数据类型为
string的序列 - answer: 答案,数据类型为
string - answer_index: 答案索引,数据类型为
int64 - cot_content: 内容,数据类型为
string - category: 类别,数据类型为
string - src: 来源,数据类型为
string
数据分割
- test: 测试集,包含 2059 个样本,大小为 1203099 字节
- validation: 验证集,包含 70 个样本,大小为 61129 字节
数据集大小
- 下载大小: 658566 字节
- 数据集总大小: 1264228 字节
配置
- config_name: default
- data_files:
- test: 路径为
data/test-* - validation: 路径为
data/validation-*
- test: 路径为
- data_files:
许可证
- license: MIT
标签
- MMLU-Pro
- IRT
数据集描述
- 来源: 该数据集是从 MMLU-Pro 中通过 Item Response Theory 选择的一个子集,包含 2059 个样本。
- 目的: 该子集旨在更好地分离能力范围内的分数,使得模型在评估时得分更为分散,避免集中在分数范围的底部。
- 评估: 该数据集适用于使用 Eleuther LM-Eval 进行评估,评估时间为约 6 分钟。
参考文献
-
MMLU-Pro:
@misc{wang2024mmlupro, title={MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark}, author={Yubo Wang and Xueguang Ma and Ge Zhang and Yuansheng Ni and Abhranil Chandra and Shiguang Guo and Weiming Ren and Aaran Arulraj and Xuan He and Ziyan Jiang and Tianle Li and Max Ku and Kai Wang and Alex Zhuang and Rongqi Fan and Xiang Yue and Wenhu Chen}, year={2024}, eprint={2406.01574}, archivePrefix={arXiv}, primaryClass={cs.CL} }
-
MMLU:
@article{hendryckstest2021, title={Measuring Massive Multitask Language Understanding}, author={Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt}, journal={Proceedings of the International Conference on Learning Representations (ICLR)}, year={2021} }
@article{hendrycks2021ethics, title={Aligning AI With Shared Human Values}, author={Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt}, journal={Proceedings of the International Conference on Learning Representations (ICLR)}, year={2021} }



