MATH-Reasoning-Paths
收藏魔搭社区2025-10-13 更新2025-11-08 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/MATH-Reasoning-Paths
下载链接
链接失效反馈官方服务:
资源简介:
# Sampled Reasoning Paths for the MATH dataset
This dataset contains sampled reasoning paths for the [MATH](https://github.com/hendrycks/math) dataset, released as part of the NeurIPS 2025 paper: ["A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning"](https://wnjxyk.github.io/RPC).
## Overview
We generated multiple reasoning paths for MATH problems using 3 math LLMs:
* [Deepseek-Math-RL-7B](https://huggingface.co/deepseek-ai/deepseek-math-7b-rl)
* [InternLM2-Math-Plus-1.8B](https://huggingface.co/internlm/internlm2-math-plus-1_8b)
* [InternLM2-Math-Plus-7B](https://huggingface.co/internlm/internlm2-math-plus-7b)
For each problem in the MATH dataset, we sampled 100 reasoning paths. Sampling was performed with temperatures ∈ {1.0, 1.1, 1.3} to explore diverse reasoning trajectories.
## Structure of each JSON
The JSON structure is illustrated below with an example of 3 samples per problem across 2 problems:
```json
{
"predict": [ // 2D string array: [problems][samples]
["Prediction #1 for Problem 1", "Prediction #2 for Problem 1", "Prediction #3 for Problem 1"],
["Prediction #1 for Problem 2", "Prediction #2 for Problem 2", "Prediction #3 for Problem 2"]
],
"answer": [ // Ground truth answers
"Answer for Problem 1", "Answer for Problem 2"
],
"completion": [ // 2D string array: [problems][samples]
["Completion #1 for Problem 1", "Completion #2 for Problem 1", "Completion #3 for Problem 1"],
["Completion #1 for Problem 2", "Completion #2 for Problem 2", "Completion #3 for Problem 2"]
],
"cumulative_logprob": [ // Sum of log probabilities per sample
[-15.526, -12.123, -14.12],
[-20.526, -22.123, -24.12]
],
"mean_logprob": [ // Normalized log probabilities (sum / sequence length, i.e., perplexity)
[-0.070, -0.04, -0.05],
[-0.170, -0.14, -0.15]
],
"prompt": [ // Input prompts for each problem
"Prompt for Problem 1", "Prompt for Problem 2"
],
"temperature": 0, // Sampling temperature
"top_p": 1, // Nucleus sampling parameter
"accuracy": [ // 2D boolean array: [samples][problems]
[false, true],
[true, true],
[true, true]
]
}
```
## Available Files
||Deepseek-Math-RL-7B|InternLM2-Math-Plus-7B|InternLM2-Math-Plus-1.8B|
|:--:|:--|:--|:--|
|T=1.0|`Deepseek-Math-RL-7B.json`|`InternLM2-Math-Plus-7B.json`|`InternLM2-Math-Plus-1.8B.json`|
|T=1.1|`Deepseek-Math-RL-7B-T=1.1.json`|`InternLM2-Math-Plus-7B-T=1.1.json`|`InternLM2-Math-Plus-1.8B-T=1.1.json`|
|T=1.3|`Deepseek-Math-RL-7B-T=1.3.json`|`InternLM2-Math-Plus-7B-T=1.3.json`|`NULL`|
## Citation
If you use this dataset in your research, please cite:
```bibtex
@inproceedings{zhou24theoretical,
author = {Zhou, Zhi and Tan, Yuhao and Li, Zenan and Yao, Yuan and Guo, Lan-Zhe and Li, Yu-Feng and Ma, Xiaoxing},
title = {A Theorecial Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
}
```
# MATH数据集采样推理路径
本数据集为发表于NeurIPS 2025的论文《面向大语言模型(Large Language Model, LLM)推理的内部概率与自洽性衔接理论研究》(原标题:*A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning*)所配套发布的[MATH](https://github.com/hendrycks/math)数据集采样推理路径集,项目主页为:https://wnjxyk.github.io/RPC。
## 数据集概览
我们使用3款数学领域大语言模型为MATH数据集中的所有题目生成了多条推理路径:
* [Deepseek-Math-RL-7B](https://huggingface.co/deepseek-ai/deepseek-math-7b-rl)
* [InternLM2-Math-Plus-1.8B](https://huggingface.co/internlm/internlm2-math-plus-1_8b)
* [InternLM2-Math-Plus-7B](https://huggingface.co/internlm/internlm2-math-plus-7b)
针对MATH数据集中的每一道题目,我们共采样100条推理路径。采样过程设置温度参数取值为{1.0, 1.1, 1.3},以探索多样化的推理轨迹。
## JSON数据集结构
以下以两道题目、每道题3条采样样本的场景为例,展示该数据集的JSON结构:
json
{
"predict": [ // 二维字符串数组:维度为[题目数][样本数],存储各样本的预测答案
["题目1的预测答案#1", "题目1的预测答案#2", "题目1的预测答案#3"],
["题目2的预测答案#1", "题目2的预测答案#2", "题目2的预测答案#3"]
],
"answer": [ // 每道题的标准答案
"题目1的标准答案", "题目2的标准答案"
],
"completion": [ // 二维字符串数组:维度为[题目数][样本数],存储各样本的完整推理生成内容
["题目1的样本#1完整推理内容", "题目1的样本#2完整推理内容", "题目1的样本#3完整推理内容"],
["题目2的样本#1完整推理内容", "题目2的样本#2完整推理内容", "题目2的样本#3完整推理内容"]
],
"cumulative_logprob": [ // 每条样本的对数概率总和
[-15.526, -12.123, -14.12],
[-20.526, -22.123, -24.12]
],
"mean_logprob": [ // 归一化对数概率(即总和除以序列长度,对应困惑度perplexity)
[-0.070, -0.04, -0.05],
[-0.170, -0.14, -0.15]
],
"prompt": [ // 每道题的输入提示词
"题目1的输入提示词", "题目2的输入提示词"
],
"temperature": 0, // 采样温度参数
"top_p": 1, // 核采样参数
"accuracy": [ // 二维布尔数组:维度为[样本数][题目数],标记各样本在对应题目上的预测正确性
[false, true],
[true, true],
[true, true]
]
}
## 可用文件
| 采样温度 | Deepseek-Math-RL-7B | InternLM2-Math-Plus-7B | InternLM2-Math-Plus-1.8B |
|:--:|:--|:--|:--|
| T=1.0 | `Deepseek-Math-RL-7B.json` | `InternLM2-Math-Plus-7B.json` | `InternLM2-Math-Plus-1.8B.json` |
| T=1.1 | `Deepseek-Math-RL-7B-T=1.1.json` | `InternLM2-Math-Plus-7B-T=1.1.json` | `InternLM2-Math-Plus-1.8B-T=1.1.json` |
| T=1.3 | `Deepseek-Math-RL-7B-T=1.3.json` | `InternLM2-Math-Plus-7B-T=1.3.json` | `NULL` |
## 引用声明
若您在研究中使用本数据集,请引用如下文献:
bibtex
@inproceedings{zhou24theoretical,
author = {Zhou, Zhi and Tan, Yuhao and Li, Zenan and Yao, Yuan and Guo, Lan-Zhe and Li, Yu-Feng and Ma, Xiaoxing},
title = {面向大语言模型推理的内部概率与自洽性衔接理论研究},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
}
提供机构:
maas
创建时间:
2025-10-10



