sam-paech/mmlu-pro-irt-1-0

Name: sam-paech/mmlu-pro-irt-1-0
Creator: sam-paech
Published: 2024-07-05 23:41:13
License: 暂无描述

Hugging Face2024-07-05 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/sam-paech/mmlu-pro-irt-1-0

下载链接

链接失效反馈

官方服务：

资源简介：

MMLU-Pro-IRT数据集是从MMLU-Pro数据集中通过项目反应理论（IRT）筛选出的一个子集，旨在更好地分离不同能力范围的分数。该数据集包含2059个条目，比完整的MMLU-Pro数据集（12000个条目）小，因此运行速度更快。数据集主要用于评估语言模型的能力，特别是在不使用链式思维（CoT）的情况下，通过IRT选择的问题能够更好地区分不同能力水平的模型。

The MMLU-Pro-IRT dataset is a subset of the MMLU-Pro dataset, selected using Item Response Theory (IRT) to better separate scores across the ability range. It contains 2059 items, which is smaller than the full MMLU-Pro dataset (12000 items), making it faster to run. The dataset is primarily used to evaluate the capabilities of language models, especially in scenarios where Chain-of-Thought (CoT) is not used, as the IRT-selected questions are better at discriminating between models of different ability levels.

提供机构：

sam-paech

原始信息汇总

MMLU-Pro-IRT 数据集概述

数据集信息

特征

question_id: 问题ID，数据类型为 int64
question: 问题内容，数据类型为 string
options: 选项，数据类型为 string 的序列
answer: 答案，数据类型为 string
answer_index: 答案索引，数据类型为 int64
cot_content: 内容，数据类型为 string
category: 类别，数据类型为 string
src: 来源，数据类型为 string

数据分割

test: 测试集，包含 2059 个样本，大小为 1203099 字节
validation: 验证集，包含 70 个样本，大小为 61129 字节

数据集大小

下载大小: 658566 字节
数据集总大小: 1264228 字节

配置

config_name: default
- data_files:
  - test: 路径为 data/test-*
  - validation: 路径为 data/validation-*

许可证

license: MIT

数据集描述

来源: 该数据集是从 MMLU-Pro 中通过 Item Response Theory 选择的一个子集，包含 2059 个样本。
目的: 该子集旨在更好地分离能力范围内的分数，使得模型在评估时得分更为分散，避免集中在分数范围的底部。
评估: 该数据集适用于使用 Eleuther LM-Eval 进行评估，评估时间为约 6 分钟。

参考文献

MMLU-Pro:

@misc{wang2024mmlupro, title={MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark}, author={Yubo Wang and Xueguang Ma and Ge Zhang and Yuansheng Ni and Abhranil Chandra and Shiguang Guo and Weiming Ren and Aaran Arulraj and Xuan He and Ziyan Jiang and Tianle Li and Max Ku and Kai Wang and Alex Zhuang and Rongqi Fan and Xiang Yue and Wenhu Chen}, year={2024}, eprint={2406.01574}, archivePrefix={arXiv}, primaryClass={cs.CL} }
MMLU:

@article{hendryckstest2021, title={Measuring Massive Multitask Language Understanding}, author={Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt}, journal={Proceedings of the International Conference on Learning Representations (ICLR)}, year={2021} }

@article{hendrycks2021ethics, title={Aligning AI With Shared Human Values}, author={Dan Hendrycks and Collin Burns and Steven Basart and Andrew Critch and Jerry Li and Dawn Song and Jacob Steinhardt}, journal={Proceedings of the International Conference on Learning Representations (ICLR)}, year={2021} }