five

StepMathBench

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/SHU-XUN/StepMathAgent
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为StepMathBench,是一个包含1000个分步骤的过程评估实例的基准测试集,这些实例源自200个按问题类型、学科类别和难度等级分组的高质量数学问题。这些问题类型包括计算、证明和开放式问题。解决方案是通过多种大型语言模型生成的,并由六位资质较高的评分员进行评分。该数据集的规模为从200个问题中提取的1000个实例,其任务是进行数学过程评估。

The dataset named StepMathBench is a benchmark collection containing 1,000 step-by-step process evaluation instances. These instances are derived from 200 high-quality mathematical problems grouped by problem type, subject category and difficulty level. The covered problem types include computational problems, proof problems and open-ended questions. Solutions for these instances were generated by multiple large language models (LLMs) and scored by six highly qualified raters. This dataset comprises 1,000 instances extracted from the 200 source problems, with its core task focused on mathematical process evaluation.
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作