OlympiadBench

魔搭社区2026-01-09 更新2025-01-25 收录

下载链接：

https://modelscope.cn/datasets/AI-ModelScope/OlympiadBench

下载链接

链接失效反馈

官方服务：

资源简介：

# OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems[ACL 2024] [**📖 arXiv**](https://arxiv.org/abs/2402.14008) | [**GitHub**](https://github.com/OpenBMB/OlympiadBench) **Note**: We have made adjustments to the image content in the multimodal portion of the dataset and fixed previous issues where some images in the English physics subset were not displayed properly. If your usage involves images, please re-download the dataset (we recommend all users to download the latest version). Additionally, some entries in the solution field may also include images. However, due to image display limitations on Hugging Face, we did not include them in this update. If you need the images embedded in the solution field, please download the full dataset from [**the whole data link**](https://drive.google.com/file/d/1DnTCrvIv5vbfDmi2yYaCWnzUhvD34oB0/view?usp=sharing). This version contains all the original image content. Thank you for your support of **OlympiadBench** — we hope it is helpful to your work. ## Dataset Description **OlympiadBench** is an Olympiad-level bilingual multimodal scientific benchmark, featuring 8,476 problems from Olympiad-level mathematics and physics competitions, including the Chinese college entrance exam. Each problem is detailed with expert-level annotations for step-by-step reasoning. Notably, the best-performing model, GPT-4V, attains an average score of 17.97% on OlympiadBench, with a mere 10.74% in physics, highlighting the benchmark rigor and the intricacy of physical reasoning. More details are at our [GitHub](https://github.com/OpenBMB/OlympiadBench). ## Contact - Chaoqun He: hechaoqun1998@gmail.com ## Citation If you do find our code helpful or use our benchmark dataset, please citing our paper. **BibTeX:** ```bibtex @article{he2024olympiadbench, title={Olympiadbench: A challenging benchmark for promoting agi with olympiad-level bilingual multimodal scientific problems}, author={He, Chaoqun and Luo, Renjie and Bai, Yuzhuo and Hu, Shengding and Thai, Zhen Leng and Shen, Junhao and Hu, Jinyi and Han, Xu and Huang, Yujie and Zhang, Yuxiang and others}, journal={arXiv preprint arXiv:2402.14008}, year={2024} } ```

# OlympiadBench：依托奥赛级双语多模态科学问题构建通用人工智能（AGI）的高挑战性基准数据集[ACL 2024] [**📖 arXiv**](https://arxiv.org/abs/2402.14008) | [**GitHub**](https://github.com/OpenBMB/OlympiadBench) **注意**：我们已对数据集多模态部分的图像内容进行调整，并修复了此前英语物理子集中部分图像无法正常显示的问题。若您的使用场景涉及图像，请重新下载数据集（我们建议所有用户下载最新版本）。此外，解答字段中的部分条目同样包含图像。但由于Hugging Face平台的图像显示限制，本次更新未包含此类图像。若您需要嵌入解答字段的图像，请从[完整数据链接](https://drive.google.com/file/d/1DnTCrvIv5vbfDmi2yYaCWnzUhvD34oB0/view?usp=sharing)下载完整数据集，该版本包含所有原始图像内容。感谢您对OlympiadBench的支持——希望本数据集对您的研究工作有所助益。 ## 数据集说明 **OlympiadBench**是一款奥赛级双语多模态科学基准数据集，收录了8476道来自各类奥赛级数学与物理竞赛（含中国普通高等学校招生全国统一考试，即高考）的题目。每道题目均附带专家级的分步推理注释。值得注意的是，当前性能最优的模型GPT-4V在OlympiadBench上的平均得分仅为17.97%，其中物理子集得分更是仅有10.74%，这充分体现了本基准数据集的严谨性以及物理推理任务的复杂性。更多详细信息可查阅我们的[GitHub仓库](https://github.com/OpenBMB/OlympiadBench)。 ## 联系方式 - 何超群：hechaoqun1998@gmail.com ## 引用若您认为我们的代码有帮助或使用了本基准数据集，请引用我们的论文。 **BibTeX：** bibtex @article{he2024olympiadbench, title={Olympiadbench: A challenging benchmark for promoting agi with olympiad-level bilingual multimodal scientific problems}, author={He, Chaoqun and Luo, Renjie and Bai, Yuzhuo and Hu, Shengding and Thai, Zhen Leng and Shen, Junhao and Hu, Jinyi and Han, Xu and Huang, Yujie and Zhang, Yuxiang and others}, journal={arXiv preprint arXiv:2402.14008}, year={2024} }

提供机构：

maas

创建时间：

2025-01-22

5,000+

优质数据集

54 个

任务类型

进入经典数据集