SuperGPQA
收藏SuperGPQA数据集概述
数据集简介
SuperGPQA是一个全面的评估基准,旨在评估研究生级别的知识和推理能力,跨越285个学科。该基准采用一种新颖的人类与LLM协作过滤机制,通过基于LLM响应和专家反馈的迭代精炼,消除了简单或模糊的问题。实验结果显示,当前最先进的LLM在多个知识领域(例如,推理聚焦模型DeepSeek-R1在SuperGPQA上达到了61.82%的最高准确度)的性能仍有很大的提升空间,突显了当前模型能力与人工通用智能之间的巨大差距。
数据集统计
- 学科总数:285
- 按学科划分的数据量统计:
- 农学:485
- 经济学:873
- 教育:484
- 工程学:7892
- 历史:674
- 法学:656
- 文学与艺术:1676
- 管理:501
- 医学:2755
- 军事科学:205
- 哲学:347
- 科学:9838
- 社会学:143
数据集组成
SuperGPQA包含多个子任务,涵盖不同难度级别的问题,用于评估各种模型的性能。
性能指标
性能指标分为总体性能(样本、子领域、领域、学科)以及在不同难度级别(简单、中等、困难)的样本上的性能。
模型列表
- 推理模型:DeepSeek-R1, o1-2024-12-17, DeepSeek-R1-Zero, o3-mini-2025-01-31-high, o3-mini-2025-01-31-medium 等
- 聊天模型:Doubao-1.5-pro-32k-250115, Doubao-1.5-pro-32k-241225, Qwen-max-2025-01-25, Claude-3-5-sonnet-20241022, Gemini-2.0-flash 等
- 基础模型:Qwen2.5-72B, Qwen2.5-32B, DeepSeek-V3-Base, Qwen2.5-14B, Yi-1.5-34B 等
相关链接
- 主页:SuperGPQA Homepage
- 数据集:Hugging Face Dataset
- 论文:ArXiv Paper
- 排行榜:Leaderboard
- GitHub:GitHub Repository
引用
bibtex @misc{pteam2025supergpqascalingllmevaluation, title={SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines}, author={P Team and Xinrun Du and Yifan Yao and Kaijing Ma and Bingli Wang and Tianyu Zheng and Kang Zhu and Minghao Liu and Yiming Liang and Xiaolong Jin and Zhenlin Wei and Chujie Zheng and Kaixing Deng and Shuyue Guo and Shian Jia and Sichao Jiang and Yiyan Liao and Rui Li and Qinrui Li and Sirun Li and Yizhi Li and Yunwen Li and Dehua Ma and Yuansheng Ni and Haoran Que and Qiyao Wang and Zhoufutu Wen and Siwei Wu and Tianshun Xing and Ming Xu and Zhenzhu Yang and Zekun Moore Wang and Junting Zhou and Yuelin Bai and Xingyuan Bu and Chenglin Cai and Liang Chen and Yifan Chen and Chengtuo Cheng and Tianhao Cheng and Keyi Ding and Siming Huang and Yun Huang and Yaoru Li and Yizhe Li and Zhaoqun Li and Tianhao Liang and Chengdong Lin and Hongquan Lin and Yinghao Ma and Zhongyuan Peng and Zifan Peng and Qige Qi and Shi Qiu and Xingwei Qu and Yizhou Tan and Zili Wang and Chenqing Wang and Hao Wang and Yiya Wang and Yubo Wang and Jiajun Xu and Kexin Yang and Ruibin Yuan and Yuanhao Yue and Tianyang Zhan and Chun Zhang and Jingyang Zhang and Xiyue Zhang and Xingjian Zhang and Yue Zhang and Yongchi Zhao and Xiangyu Zheng and Chenghua Zhong and Yang Gao and Zhoujun Li and Dayiheng Liu and Qian Liu and Tianyu Liu and Shiwen Ni and Junran Peng and Yujia Qin and Wenbo Su and Guoyin Wang and Shi Wang and Jian Yang and Min Yang and Meng Cao and Xiang Yue and Zhaoxiang Zhang and Wangchunshu Zhou and Jiaheng Liu and Qunshu Lin and Wenhao Huang and Ge Zhang}, year={2025}, eprint={2502.14739}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2502.14739}, }




