five

OlympicArena

收藏
魔搭社区2025-11-07 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/GAIR/OlympicArena
下载链接
链接失效反馈
官方服务:
资源简介:
# OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI **OlympicArena** is a comprehensive, highly-challenging, and rigorously curated benchmark featuring a detailed, fine-grained evaluation mechanism designed to assess advanced AI capabilities across a broad spectrum of Olympic-level challenges. This benchmark encompasses seven disciplines: Mathematics, Physics, Chemistry, Biology, Geography, Astronomy, and Computer Science. Each discipline is divided into two splits: validation (val) and test. The validation split includes publicly available answers for small-scale testing and evaluation, while the test split does not disclose the answers, users could submit their results. # An Example to load the data ```python from datasets import load_dataset dataset=load_dataset("GAIR/OlympicArena", "Math", split="val") print(dataset[0]) ``` More details on loading and using the data are at our [github page](https://github.com/GAIR-NLP/OlympicArena). If you do find our code helpful or use our benchmark dataset, please citing our paper. ``` @article{huang2024olympicarena, title={OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI}, author={Zhen Huang and Zengzhi Wang and Shijie Xia and Xuefeng Li and Haoyang Zou and Ruijie Xu and Run-Ze Fan and Lyumanshan Ye and Ethan Chern and Yixin Ye and Yikai Zhang and Yuqing Yang and Ting Wu and Binjie Wang and Shichao Sun and Yang Xiao and Yiyuan Li and Fan Zhou and Steffi Chern and Yiwei Qin and Yan Ma and Jiadi Su and Yixiu Liu and Yuxiang Zheng and Shaoting Zhang and Dahua Lin and Yu Qiao and Pengfei Liu}, year={2024}, journal={arXiv preprint arXiv:2406.12753}, url={https://arxiv.org/abs/2406.12753} } ```

# OlympicArena:面向超级智能AI的多学科认知推理基准测试 **OlympicArena** 是一款全面性突出、难度层级较高且经过严格甄选的基准测试集,搭载细致入微的细粒度评估机制,用于评估覆盖广泛奥林匹克级挑战范畴内的先进AI能力。 该基准测试集涵盖七大学科:数学、物理学、化学、生物学、地理学、天文学与计算机科学。每个学科均划分为两个子集:验证集(validation,简称val)与测试集(test)。其中验证集包含公开可用的答案,用于小规模测试与评估;测试集则不披露答案,用户可提交其推理结果。 # 数据加载示例 python from datasets import load_dataset dataset=load_dataset("GAIR/OlympicArena", "Math", split="val") print(dataset[0]) 有关数据加载与使用的更多细节,请参阅我们的[GitHub页面](https://github.com/GAIR-NLP/OlympicArena)。 若您认为本代码对您有所帮助,或使用了本基准测试集,请引用我们的论文。 @article{huang2024olympicarena, title={OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI}, author={Zhen Huang and Zengzhi Wang and Shijie Xia and Xuefeng Li and Haoyang Zou and Ruijie Xu and Run-Ze Fan and Lyumanshan Ye and Ethan Chern and Yixin Ye and Yikai Zhang and Yuqing Yang and Ting Wu and Binjie Wang and Shichao Sun and Yang Xiao and Yiyuan Li and Fan Zhou and Steffi Chern and Yiwei Qin and Yan Ma and Jiadi Su and Yixiu Liu and Yuxiang Zheng and Shaoting Zhang and Dahua Lin and Yu Qiao and Pengfei Liu}, year={2024}, journal={arXiv preprint arXiv:2406.12753}, url={https://arxiv.org/abs/2406.12753} }
提供机构:
maas
创建时间:
2025-02-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作