MathIF
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/TingchenFu/MathIF
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个专门为评估在数学推理任务中遵循指令能力而设立的基准测试。它对各种模型在三个不同参数规模上进行评估,并突显了在扩大推理能力与保持可控性之间的紧张关系。该数据集的任务是评估大型推理模型(LRMs)在数学问题中遵循指令的能力。
This dataset is a benchmark specifically designed to evaluate the instruction-following capability of models in mathematical reasoning tasks. It evaluates various models across three distinct parameter scales and highlights the tension between scaling reasoning capabilities and maintaining controllability. The tasks of this dataset aim to assess the instruction-following performance of Large Reasoning Models (LRMs) in mathematical problems.



