OlymMATH-eval
收藏OlymMATH Evaluation Results 数据集概述
基本信息
- 语言:中文(zh)、英文(en)
- 任务类别:问答(question-answering)
- 许可证:MIT
数据集配置
数据集包含多个配置,每个配置对应不同的模型和数据集分割:
-
qwq-32b
- 数据文件:
- zh_hard: data/qwq-32b/zh-hard.parquet
- zh_easy: data/qwq-32b/zh-easy.parquet
- en_hard: data/qwq-32b/en-hard.parquet
- en_easy: data/qwq-32b/en-easy.parquet
- 数据文件:
-
deepseek-r1-distill-qwen-14b
- 数据文件:
- zh_hard: data/deepseek-r1-distill-qwen-14b/zh-hard.parquet
- zh_easy: data/deepseek-r1-distill-qwen-14b/zh-easy.parquet
- en_hard: data/deepseek-r1-distill-qwen-14b/en-hard.parquet
- en_easy: data/deepseek-r1-distill-qwen-14b/en-easy.parquet
- 数据文件:
-
qwen3-4b
- 数据文件:
- zh_hard: data/qwen3-4b/zh-hard.parquet
- zh_easy: data/qwen3-4b/zh-easy.parquet
- en_hard: data/qwen3-4b/en-hard.parquet
- en_easy: data/qwen3-4b/en-easy.parquet
- 数据文件:
-
deepseek-r1-distill-qwen-7b
- 数据文件:
- zh_hard: data/deepseek-r1-distill-qwen-7b/zh-hard.parquet
- zh_easy: data/deepseek-r1-distill-qwen-7b/zh-easy.parquet
- en_hard: data/deepseek-r1-distill-qwen-7b/en-hard.parquet
- en_easy: data/deepseek-r1-distill-qwen-7b/en-easy.parquet
- 数据文件:
-
qwen3-30b-a3b
- 数据文件:
- zh_hard: data/qwen3-30b-a3b/zh-hard.parquet
- zh_easy: data/qwen3-30b-a3b/zh-easy.parquet
- en_hard: data/qwen3-30b-a3b/en-hard.parquet
- en_easy: data/qwen3-30b-a3b/en-easy.parquet
- 数据文件:
-
openmath-nemotron-14b
- 数据文件:
- zh_hard: data/openmath-nemotron-14b/zh-hard.parquet
- zh_easy: data/openmath-nemotron-14b/zh-easy.parquet
- en_hard: data/openmath-nemotron-14b/en-hard.parquet
- en_easy: data/openmath-nemotron-14b/en-easy.parquet
- 数据文件:
-
deepscaler-1.5b-preview
- 数据文件:
- zh_hard: data/deepscaler-1.5b-preview/zh-hard.parquet
- zh_easy: data/deepscaler-1.5b-preview/zh-easy.parquet
- en_hard: data/deepscaler-1.5b-preview/en-hard.parquet
- en_easy: data/deepscaler-1.5b-preview/en-easy.parquet
- 数据文件:
-
still-3-1.5b-preview
- 数据文件:
- zh_hard: data/still-3-1.5b-preview/zh-hard.parquet
- zh_easy: data/still-3-1.5b-preview/zh-easy.parquet
- en_hard: data/still-3-1.5b-preview/en-hard.parquet
- en_easy: data/still-3-1.5b-preview/en-easy.parquet
- 数据文件:
-
skywork-or1-32b-preview
- 数据文件:
- zh_hard: data/skywork-or1-32b-preview/zh-hard.parquet
- zh_easy: data/skywork-or1-32b-preview/zh-easy.parquet
- en_hard: data/skywork-or1-32b-preview/en-hard.parquet
- en_easy: data/skywork-or1-32b-preview/en-easy.parquet
- 数据文件:
-
openmath-nemotron-7b
- 数据文件:
- zh_hard: data/openmath-nemotron-7b/zh-hard.parquet
- zh_easy: data/openmath-nemotron-7b/zh-easy.parquet
- en_hard: data/openmath-nemotron-7b/en-hard.parquet
- en_easy: data/openmath-nemotron-7b/en-easy.parquet
- 数据文件:
-
light-r1-32b-ds
- 数据文件:
- zh_hard: data/light-r1-32b-ds/zh-hard.parquet
- zh_easy: data/light-r1-32b-ds/zh-easy.parquet
- en_hard: data/light-r1-32b-ds/en-hard.parquet
- en_easy: data/light-r1-32b-ds/en-easy.parquet
- 数据文件:
-
deepseek-r1-distill-qwen-32b
- 数据文件:
- zh_hard: data/deepseek-r1-distill-qwen-32b/zh-hard.parquet
- zh_easy: data/deepseek-r1-distill-qwen-32b/zh-easy.parquet
- en_hard: data/deepseek-r1-distill-qwen-32b/en-hard.parquet
- en_easy: data/deepseek-r1-distill-qwen-32b/en-easy.parquet
- 数据文件:
-
openthinker2-32b
- 数据文件:
- zh_hard: data/openthinker2-32b/zh-hard.parquet
- zh_easy: data/openthinker2-32b/zh-easy.parquet
- en_hard: data/openthinker2-32b/en-hard.parquet
- en_easy: data/openthinker2-32b/en-easy.parquet
- 数据文件:
-
openmath-nemotron-1.5b
- 数据文件:
- zh_hard: data/openmath-nemotron-1.5b/zh-hard.parquet
- zh_easy: data/openmath-nemotron-1.5b/zh-easy.parquet
- en_hard: data/openmath-nemotron-1.5b/en-hard.parquet
- en_easy: data/openmath-nemotron-1.5b/en-easy.parquet
- 数据文件:
-
light-r1-7b-ds
- 数据文件:
- zh_hard: data/light-r1-7b-ds/zh-hard.parquet
- zh_easy: data/light-r1-7b-ds/zh-easy.parquet
- en_hard: data/light-r1-7b-ds/en-hard.parquet
- en_easy: data/light-r1-7b-ds/en-easy.parquet
- 数据文件:
-
light-r1-14b-ds
- 数据文件:
- zh_hard: data/light-r1-14b-ds/zh-hard.parquet
- zh_easy: data/light-r1-14b-ds/zh-easy.parquet
- en_hard: data/light-r1-14b-ds/en-hard.parquet
- en_easy: data/light-r1-14b-ds/en-easy.parquet
- 数据文件:
-
openthinker2-7b
- 数据文件:
- zh_hard: data/openthinker2-7b/zh-hard.parquet
- zh_easy: data/openthinker2-7b/zh-easy.parquet
- en_hard: data/openthinker2-7b/en-hard.parquet
- en_easy: data/openthinker2-7b/en-easy.parquet
- 数据文件:
-
skywork-or1-7b-preview
- 数据文件:
- zh_hard: data/skywork-or1-7b-preview/zh-hard.parquet
- zh_easy: data/skywork-or1-7b-preview/zh-easy.parquet
- en_hard: data/skywork-or1-7b-preview/en-hard.parquet
- en_easy: data/skywork-or1-7b-preview/en-easy.parquet
- 数据文件:
-
deepseek-r1-distill-qwen-1.5b
- 数据文件:
- zh_hard: data/deepseek-r1-distill-qwen-1.5b/zh-hard.parquet
- zh_easy: data/deepseek-r1-distill-qwen-1.5b/zh-easy.parquet
- en_hard: data/deepseek-r1-distill-qwen-1.5b/en-hard.parquet
- en_easy: data/deepseek-r1-distill-qwen-1.5b/en-easy.parquet
- 数据文件:
-
skywork-or1-math-7b
- 数据文件:
- zh_hard: data/skywork-or1-math-7b/zh-hard.parquet
- zh_easy: data/skywork-or1-math-7b/zh-easy.parquet
- en_hard: data/skywork-or1-math-7b/en-hard.parquet
- en_easy: data/skywork-or1-math-7b/en-easy.parquet
- 数据文件:
-
acemath-rl-nemotron-7b
- 数据文件:
- zh_hard: data/acemath-rl-nemotron-7b/zh-hard.parquet
- zh_easy: data/acemath-rl-nemotron-7b/zh-easy.parquet
- en_hard: data/acemath-rl-nemotron-7b/en-hard.parquet
- en_easy: data/acemath-rl-nemotron-7b/en-easy.parquet
- 数据文件:
-
qwen3-0.6b
- 数据文件:
- zh_hard: data/qwen3-0.6b/zh-hard.parquet
- zh_easy: data/qwen3-0.6b/zh-easy.parquet
- en_hard: data/qwen3-0.6b/en-hard.parquet
- en_easy: data/qwen3-0.6b/en-easy.parquet
- 数据文件:
-
qwen3-235b-a22b
- 数据文件:
- zh_hard: data/qwen3-235b-a22b/zh-hard.parquet
- zh_easy: data/qwen3-235b-a22b/zh-easy.parquet
- en_hard: data/qwen3-235b-a22b/en-hard.parquet
- en_easy: data/qwen3-235b-a22b/en-easy.parquet
- 数据文件:
相关资源
- 论文:Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models
- GitHub:RUCAIBox/OlymMATH
- 演示:OlymMATH Demo
引用
bibtex @misc{sun2025challengingboundariesreasoningolympiadlevel, title={Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models}, author={Haoxiang Sun and Yingqian Min and Zhipeng Chen and Wayne Xin Zhao and Zheng Liu and Zhongyuan Wang and Lei Fang and Ji-Rong Wen}, year={2025}, eprint={2503.21380}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2503.21380}, }




