introvoyz041/OlympiadBench
收藏Hugging Face2025-12-14 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/introvoyz041/OlympiadBench
下载链接
链接失效反馈官方服务:
资源简介:
OlympiadBench是一个奥林匹克级别的双语多模态科学基准测试,包含了来自奥林匹克级别数学和物理竞赛的8,476个问题,包括中国高考。每个问题都有专家级别的逐步推理注释。值得注意的是,性能最好的模型GPT-4V在OlympiadBench上的平均得分仅为17.97%,在物理部分更是低至10.74%,这凸显了基准测试的严格性和物理推理的复杂性。
OlympiadBench is an Olympiad-level bilingual multimodal scientific benchmark, featuring 8,476 problems from Olympiad-level mathematics and physics competitions, including the Chinese college entrance exam. Each problem is detailed with expert-level annotations for step-by-step reasoning. Notably, the best-performing model, GPT-4V, attains an average score of 17.97% on OlympiadBench, with a mere 10.74% in physics, highlighting the benchmark rigor and the intricacy of physical reasoning.
提供机构:
introvoyz041



