MathIF

arXiv2025-09-30 收录

下载链接：

https://github.com/TingchenFu/MathIF

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个专门为评估在数学推理任务中遵循指令能力而设立的基准测试。它对各种模型在三个不同参数规模上进行评估，并突显了在扩大推理能力与保持可控性之间的紧张关系。该数据集的任务是评估大型推理模型（LRMs）在数学问题中遵循指令的能力。

This dataset is a benchmark specifically designed to evaluate the instruction-following capability of models in mathematical reasoning tasks. It evaluates various models across three distinct parameter scales and highlights the tension between scaling reasoning capabilities and maintaining controllability. The tasks of this dataset aim to assess the instruction-following performance of Large Reasoning Models (LRMs) in mathematical problems.

5,000+

优质数据集

54 个

任务类型

进入经典数据集