BeyondAIME

Name: BeyondAIME
Creator: maas
Published: 2026-01-06 16:36:06
License: 暂无描述

魔搭社区2026-01-06 更新2025-06-21 收录

下载链接：

https://modelscope.cn/datasets/ByteDance-Seed/BeyondAIME

下载链接

链接失效反馈

官方服务：

资源简介：

# BeyondAIME: Advancing Math Reasoning Evaluation Beyond High School Olympiads ## Table of Contents - [Dataset Description](#dataset-description) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [How to Use](#how-to-use) - [Citation](#citation) - [License](#license) ## Dataset Description **BeyondAIME** is a curated test set designed to benchmark advanced mathematical reasoning. Its creation was guided by the following core principles to ensure a fair and challenging evaluation: - **High Difficulty**: Problems are sourced from high-school and university mathematics competitions, with a difficulty level greater than or equal to that of AIME Problems #11-15. - **Contamination-Resistant**: Every problem has been manually revised to be unique, ensuring it will not be found in standard pre-training corpora and providing a true test of a model's reasoning abilities. - **Focus on Reasoning**, Not Knowledge: The dataset exclusively tests reasoning by ensuring that problems do not require mathematical knowledge beyond the standard university level. - **Robust Problem Design**: The dataset avoids "pseudo-proof" problems. For problems requiring proof-like steps, they have been reformulated so that guessing the answer is as difficult as formally solving the problem. - **Automated & Accurate Evaluation**: Each problem's answer is a positive integer, allowing for an unambiguous and 100% accurate automated verification of model performance. ## Data Fields Each entry in the dataset consists of two fields: - `problem`: (`string`) - A full statement of the mathematical problem, formatted in Markdown with LaTeX support for mathematical expressions. - `answer`: (`int`) - The final integer answer to the problem. Here is an example of a data instance: ```json { "problem": "A sequence of real numbers \\{a_n\\} satisfies that：\\(a_{n + 1}=2^n-7a_n，n = 0,1,2,\\cdots\\). Find the minimal possible value of \\(\\frac{1}{a_0}\\) such that \\(a_{n + 1}>a_n\\) for any positive integer \\(n\\).", "answer": 9 } ``` ## Data Splits This dataset consists of a single **`test`** split containing 100 problems, provided in the `test.parquet` file. ## Dataset Creation Each problem is an original creation at a competition-level difficulty, and the dataset has been balanced by category to ensure coverage across all fields of mathematics competitions. ## How to Use You can easily load the dataset using the Hugging Face `datasets` library. ```python from datasets import load_dataset # Load the dataset from the Hugging Face Hub ds = load_dataset("ByteDance-Seed/BeyondAIME") # Access the test split test_ds = ds['test'] # Print the first example print(test_ds[0]) ``` ## Citation If you use the BeyondAIME dataset in your research or work, please consider citing it: ```bibtex @misc{bytedance_seed_2025_beyondaime, author = {[ByteDance-Seed]}, title = {BeyondAIME: Advancing Math Reasoning Evaluation Beyond High School Olympiads}, year = {2025}, publisher = {Hugging Face}, journal = {Hugging Face repository}, howpublished = {\url{[https://huggingface.co/datasets/ByteDance-Seed/BeyondAIME](https://huggingface.co/datasets/ByteDance-Seed/BeyondAIME)}}, } ``` ## License BeyondAIME is released under the **CC0 1.0 Universal (CC0 1.0) Public Domain Dedication**. ![CC0](https://licensebuttons.net/p/zero/1.0/88x31.png) This means the work has been dedicated to the public domain by waiving all rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute, and perform the work, even for commercial purposes, all without asking permission. For more details, see the [LICENSE](LICENSE) file or the [full legal text of the CC0 license](https://creativecommons.org/publicdomain/zero/1.0/legalcode).

# BeyondAIME：超越高中奥林匹克数学竞赛的数学推理评估进阶 ## 目录 - [数据集描述](#dataset-description) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [使用方法](#how-to-use) - [引用格式](#citation) - [许可协议](#license) ## 数据集描述 **BeyondAIME 数据集（BeyondAIME）** 是一个精心构建的测试集，用于基准测试高级数学推理能力。其构建遵循以下核心原则，以确保评估的公平性与挑战性： - **高难度定位**：题目源自高中与大学数学竞赛，难度不低于美国数学邀请赛（American Invitational Mathematics Examination, AIME）的第11至15题。 - **防污染设计**：所有题目均经过人工修改以确保唯一性，确保不会在标准预训练语料库中出现，从而真实评估模型的推理能力。 - **聚焦推理而非知识**：本数据集仅测试推理能力，确保题目所需的数学知识不超出大学基础阶段范畴。 - **严谨的题目设计**：数据集规避"伪证明"类题目。对于需要类证明步骤的题目，已将其重新表述，使得猜测答案与正式求解的难度相当。 - **自动化精准评估**：每道题的答案均为正整数，可实现无歧义、100%准确的模型性能自动化验证。 ## 数据字段数据集中的每个条目包含两个字段： - `problem`：（字符串类型）—— 完整的数学问题陈述，采用Markdown格式，支持数学表达式的LaTeX语法。 - `answer`：（整数类型）—— 该问题的最终整数答案。以下为一个数据实例示例： json { "problem": "A sequence of real numbers \{a_n\} satisfies that：\(a_{n + 1}=2^n-7a_n，n = 0,1,2,\cdots\). Find the minimal possible value of \(\frac{1}{a_0}\) such that \(a_{n + 1}>a_n\) for any positive integer \(n\).", "answer": 9 } ## 数据划分本数据集仅包含一个**`test`（测试）** 划分，包含100道题目，存储于`test.parquet`文件中。 ## 数据集构建每道题目均为竞赛级难度的原创题目，数据集按类别进行均衡划分，确保覆盖数学竞赛的所有领域。 ## 使用方法您可以通过Hugging Face的`datasets`库轻松加载该数据集。 python from datasets import load_dataset # 从Hugging Face Hub加载数据集 ds = load_dataset("ByteDance-Seed/BeyondAIME") # 访问测试集划分 test_ds = ds['test'] # 打印第一个示例 print(test_ds[0]) ## 引用格式如果在研究或工作中使用BeyondAIME数据集，请引用如下： bibtex @misc{bytedance_seed_2025_beyondaime, author = {[ByteDance-Seed]}, title = {BeyondAIME: Advancing Math Reasoning Evaluation Beyond High School Olympiads}, year = {2025}, publisher = {Hugging Face}, journal = {Hugging Face repository}, howpublished = {url{[https://huggingface.co/datasets/ByteDance-Seed/BeyondAIME](https://huggingface.co/datasets/ByteDance-Seed/BeyondAIME)}}, } ## 许可协议 BeyondAIME采用**CC0 1.0 通用公共领域奉献（CC0 1.0）** 协议发布。 ![CC0](https://licensebuttons.net/p/zero/1.0/88x31.png) 这意味着该作品已通过著作权法允许的最大范围，将其在全球范围内的所有权利（包括相关及邻接权利）奉献至公共领域。您可以复制、修改、分发及使用该作品，甚至用于商业用途，无需获得许可。更多详情请参阅[LICENSE](LICENSE)文件或[CC0协议完整法律文本](https://creativecommons.org/publicdomain/zero/1.0/legalcode)。

提供机构：

maas

创建时间：

2025-06-18

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集