five

MATH-Reasoning-Paths

收藏
魔搭社区2025-10-13 更新2025-11-08 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/MATH-Reasoning-Paths
下载链接
链接失效反馈
官方服务:
资源简介:
# Sampled Reasoning Paths for the MATH dataset This dataset contains sampled reasoning paths for the [MATH](https://github.com/hendrycks/math) dataset, released as part of the NeurIPS 2025 paper: ["A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning"](https://wnjxyk.github.io/RPC). ## Overview We generated multiple reasoning paths for MATH problems using 3 math LLMs: * [Deepseek-Math-RL-7B](https://huggingface.co/deepseek-ai/deepseek-math-7b-rl) * [InternLM2-Math-Plus-1.8B](https://huggingface.co/internlm/internlm2-math-plus-1_8b) * [InternLM2-Math-Plus-7B](https://huggingface.co/internlm/internlm2-math-plus-7b) For each problem in the MATH dataset, we sampled 100 reasoning paths. Sampling was performed with temperatures ∈ {1.0, 1.1, 1.3} to explore diverse reasoning trajectories. ## Structure of each JSON The JSON structure is illustrated below with an example of 3 samples per problem across 2 problems: ```json { "predict": [ // 2D string array: [problems][samples] ["Prediction #1 for Problem 1", "Prediction #2 for Problem 1", "Prediction #3 for Problem 1"], ["Prediction #1 for Problem 2", "Prediction #2 for Problem 2", "Prediction #3 for Problem 2"] ], "answer": [ // Ground truth answers "Answer for Problem 1", "Answer for Problem 2" ], "completion": [ // 2D string array: [problems][samples] ["Completion #1 for Problem 1", "Completion #2 for Problem 1", "Completion #3 for Problem 1"], ["Completion #1 for Problem 2", "Completion #2 for Problem 2", "Completion #3 for Problem 2"] ], "cumulative_logprob": [ // Sum of log probabilities per sample [-15.526, -12.123, -14.12], [-20.526, -22.123, -24.12] ], "mean_logprob": [ // Normalized log probabilities (sum / sequence length, i.e., perplexity) [-0.070, -0.04, -0.05], [-0.170, -0.14, -0.15] ], "prompt": [ // Input prompts for each problem "Prompt for Problem 1", "Prompt for Problem 2" ], "temperature": 0, // Sampling temperature "top_p": 1, // Nucleus sampling parameter "accuracy": [ // 2D boolean array: [samples][problems] [false, true], [true, true], [true, true] ] } ``` ## Available Files ||Deepseek-Math-RL-7B|InternLM2-Math-Plus-7B|InternLM2-Math-Plus-1.8B| |:--:|:--|:--|:--| |T=1.0|`Deepseek-Math-RL-7B.json`|`InternLM2-Math-Plus-7B.json`|`InternLM2-Math-Plus-1.8B.json`| |T=1.1|`Deepseek-Math-RL-7B-T=1.1.json`|`InternLM2-Math-Plus-7B-T=1.1.json`|`InternLM2-Math-Plus-1.8B-T=1.1.json`| |T=1.3|`Deepseek-Math-RL-7B-T=1.3.json`|`InternLM2-Math-Plus-7B-T=1.3.json`|`NULL`| ## Citation If you use this dataset in your research, please cite: ```bibtex @inproceedings{zhou24theoretical, author = {Zhou, Zhi and Tan, Yuhao and Li, Zenan and Yao, Yuan and Guo, Lan-Zhe and Li, Yu-Feng and Ma, Xiaoxing}, title = {A Theorecial Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning}, booktitle = {Advances in Neural Information Processing Systems}, year = {2025}, } ```

# MATH数据集采样推理路径 本数据集为发表于NeurIPS 2025的论文《面向大语言模型(Large Language Model, LLM)推理的内部概率与自洽性衔接理论研究》(原标题:*A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning*)所配套发布的[MATH](https://github.com/hendrycks/math)数据集采样推理路径集,项目主页为:https://wnjxyk.github.io/RPC。 ## 数据集概览 我们使用3款数学领域大语言模型为MATH数据集中的所有题目生成了多条推理路径: * [Deepseek-Math-RL-7B](https://huggingface.co/deepseek-ai/deepseek-math-7b-rl) * [InternLM2-Math-Plus-1.8B](https://huggingface.co/internlm/internlm2-math-plus-1_8b) * [InternLM2-Math-Plus-7B](https://huggingface.co/internlm/internlm2-math-plus-7b) 针对MATH数据集中的每一道题目,我们共采样100条推理路径。采样过程设置温度参数取值为{1.0, 1.1, 1.3},以探索多样化的推理轨迹。 ## JSON数据集结构 以下以两道题目、每道题3条采样样本的场景为例,展示该数据集的JSON结构: json { "predict": [ // 二维字符串数组:维度为[题目数][样本数],存储各样本的预测答案 ["题目1的预测答案#1", "题目1的预测答案#2", "题目1的预测答案#3"], ["题目2的预测答案#1", "题目2的预测答案#2", "题目2的预测答案#3"] ], "answer": [ // 每道题的标准答案 "题目1的标准答案", "题目2的标准答案" ], "completion": [ // 二维字符串数组:维度为[题目数][样本数],存储各样本的完整推理生成内容 ["题目1的样本#1完整推理内容", "题目1的样本#2完整推理内容", "题目1的样本#3完整推理内容"], ["题目2的样本#1完整推理内容", "题目2的样本#2完整推理内容", "题目2的样本#3完整推理内容"] ], "cumulative_logprob": [ // 每条样本的对数概率总和 [-15.526, -12.123, -14.12], [-20.526, -22.123, -24.12] ], "mean_logprob": [ // 归一化对数概率(即总和除以序列长度,对应困惑度perplexity) [-0.070, -0.04, -0.05], [-0.170, -0.14, -0.15] ], "prompt": [ // 每道题的输入提示词 "题目1的输入提示词", "题目2的输入提示词" ], "temperature": 0, // 采样温度参数 "top_p": 1, // 核采样参数 "accuracy": [ // 二维布尔数组:维度为[样本数][题目数],标记各样本在对应题目上的预测正确性 [false, true], [true, true], [true, true] ] } ## 可用文件 | 采样温度 | Deepseek-Math-RL-7B | InternLM2-Math-Plus-7B | InternLM2-Math-Plus-1.8B | |:--:|:--|:--|:--| | T=1.0 | `Deepseek-Math-RL-7B.json` | `InternLM2-Math-Plus-7B.json` | `InternLM2-Math-Plus-1.8B.json` | | T=1.1 | `Deepseek-Math-RL-7B-T=1.1.json` | `InternLM2-Math-Plus-7B-T=1.1.json` | `InternLM2-Math-Plus-1.8B-T=1.1.json` | | T=1.3 | `Deepseek-Math-RL-7B-T=1.3.json` | `InternLM2-Math-Plus-7B-T=1.3.json` | `NULL` | ## 引用声明 若您在研究中使用本数据集,请引用如下文献: bibtex @inproceedings{zhou24theoretical, author = {Zhou, Zhi and Tan, Yuhao and Li, Zenan and Yao, Yuan and Guo, Lan-Zhe and Li, Yu-Feng and Ma, Xiaoxing}, title = {面向大语言模型推理的内部概率与自洽性衔接理论研究}, booktitle = {Advances in Neural Information Processing Systems}, year = {2025}, }
提供机构:
maas
创建时间:
2025-10-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作