five

guanning-ai/beyondaime

收藏
Hugging Face2025-10-21 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/guanning-ai/beyondaime
下载链接
链接失效反馈
官方服务:
资源简介:
BeyondAIME是一个专门设计用于评估高级数学推理能力的测试集。它的创建遵循了一些核心原则,以确保公平且具有挑战性的评估。问题来源于高中和大学的数学竞赛,难度不低于AIME的第11-15题。每个问题都是独一无二的,以避免在标准预训练语料库中找到,从而真正测试模型的推理能力。该数据集专门测试推理能力,确保问题不要求超过标准大学水平的数学知识。问题设计避免了伪证明问题,并且每个问题的答案都是正整数,以便进行无歧义且100%准确的自动性能验证。

BeyondAIME is a curated test set designed to benchmark advanced mathematical reasoning. Its creation was guided by core principles to ensure a fair and challenging evaluation. Problems are sourced from high-school and university mathematics competitions, with a difficulty level greater than or equal to that of AIME Problems #11-15. Each problem is unique to avoid contamination in standard pre-training corpora and to truly test a models reasoning abilities. The dataset focuses on reasoning, ensuring problems do not require mathematical knowledge beyond the standard university level. The problems are designed to avoid pseudo-proof issues and each has a positive integer answer for unambiguous and 100% accurate automated verification of model performance.
提供机构:
guanning-ai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作