introvoyz041/Omni-MATH

Name: introvoyz041/Omni-MATH
Creator: introvoyz041
Published: 2026-04-21 05:56:08
License: 暂无描述

Hugging Face2026-04-21 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/introvoyz041/Omni-MATH

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 language: - en tags: - math - olympiads size_categories: - 1K<n<10K --- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/65ae21adabf6d1ccb795e9a4/2K48kJlYndyPbiwVqwaRj.jpeg) # Dataset Card for Omni-MATH  Recent advancements in AI, particularly in large language models (LLMs), have led to significant breakthroughs in mathematical reasoning capabilities. However, existing benchmarks like GSM8K or MATH are now being solved with high accuracy (e.g., OpenAI o1 achieves 94.8% on MATH dataset), indicating their inadequacy for truly challenging these models. To mitigate this limitation, we propose a comprehensive and challenging benchmark specifically designed to assess LLMs' mathematical reasoning at the Olympiad level. Unlike existing Olympiad-related benchmarks, our dataset focuses exclusively on mathematics and comprises a vast collection of 4428 competition-level problems. These problems are meticulously categorized into 33 (and potentially more) sub-domains and span across 10 distinct difficulty levels, enabling a nuanced analysis of model performance across various mathematical disciplines and levels of complexity. * Project Page: https://omni-math.github.io/ * Github Repo: https://github.com/KbsdJames/Omni-MATH * Omni-Judge (opensource evaluator of this dataset): https://huggingface.co/KbsdJames/Omni-Judge ## Dataset Details ## Uses  ```python from datasets import load_dataset dataset = load_dataset("KbsdJames/Omni-MATH") ``` For further examination of the model, please refer to our github repository: https://github.com/KbsdJames/Omni-MATH ## Citation If you find our code and dataset helpful, welcome to cite our paper. ``` @misc{gao2024omnimathuniversalolympiadlevel, title={Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models}, author={Bofei Gao and Feifan Song and Zhe Yang and Zefan Cai and Yibo Miao and Qingxiu Dong and Lei Li and Chenghao Ma and Liang Chen and Runxin Xu and Zhengyang Tang and Benyou Wang and Daoguang Zan and Shanghaoran Quan and Ge Zhang and Lei Sha and Yichang Zhang and Xuancheng Ren and Tianyu Liu and Baobao Chang}, year={2024}, eprint={2410.07985}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2410.07985}, } ```

license: apache-2.0 language: - en tags: - math - olympiads size_categories: - 1K<n<10K ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/65ae21adabf6d1ccb795e9a4/2K48kJlYndyPbiwVqwaRj.jpeg) # Omni-MATH数据集卡片  近年来人工智能领域，尤其是大语言模型（Large Language Model，LLM）的快速发展，推动了数学推理能力相关研究的重大突破。然而，诸如GSM8K、MATH等现有基准数据集的题目已能被模型以极高准确率求解（例如OpenAI o1在MATH数据集上的准确率可达94.8%），这表明现有基准已不足以对这些模型形成真正的挑战。为缓解这一局限，我们构建了一款专为评估大语言模型奥林匹克级别数学推理能力而设计的综合性高难度基准数据集。与现有奥林匹克相关基准不同，本数据集仅聚焦数学领域，包含4428道竞赛级题目。这些题目经过精细分类，涵盖33个（或更多）子领域，并分为10个不同难度层级，可实现对模型在不同数学学科与复杂度层级上的性能进行精细化分析。 * 项目主页：https://omni-math.github.io/ * 代码仓库：https://github.com/KbsdJames/Omni-MATH * Omni-Judge（本数据集的开源评估工具）：https://huggingface.co/KbsdJames/Omni-Judge ## 数据集详情 ## 数据集用途  python from datasets import load_dataset dataset = load_dataset("KbsdJames/Omni-MATH") 如需进一步测试模型性能，请参阅本项目的GitHub代码仓库：https://github.com/KbsdJames/Omni-MATH ## 引用说明若您认为本项目的代码与数据集对您的研究有所帮助，欢迎引用我们的论文。 @misc{gao2024omnimathuniversalolympiadlevel, title={Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models}, author={Bofei Gao and Feifan Song and Zhe Yang and Zefan Cai and Yibo Miao and Qingxiu Dong and Lei Li and Chenghao Ma and Liang Chen and Runxin Xu and Zhengyang Tang and Benyou Wang and Daoguang Zan and Shanghaoran Quan and Ge Zhang and Lei Sha and Yichang Zhang and Xuancheng Ren and Tianyu Liu and Baobao Chang}, year={2024}, eprint={2410.07985}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2410.07985}, }

提供机构：

introvoyz041

5,000+

优质数据集

54 个

任务类型

进入经典数据集