MathOdyssey-Reasoning-Paths
收藏魔搭社区2025-10-17 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/MathOdyssey-Reasoning-Paths
下载链接
链接失效反馈官方服务:
资源简介:
# Sampled Reasoning Paths for the MathOdyssey dataset
This dataset contains sampled reasoning paths for the [MathOdyssey](https://huggingface.co/datasets/MathOdyssey/MathOdyssey) dataset, released as part of the NeurIPS 2025 paper: ["A Theoretical Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning"](https://wnjxyk.github.io/RPC).
## Overview
We generated multiple reasoning paths for MathOdyssey problems using 3 math LLMs:
* [Deepseek-Math-RL-7B](https://huggingface.co/deepseek-ai/deepseek-math-7b-rl)
* [InternLM2-Math-Plus-1.8B](https://huggingface.co/internlm/internlm2-math-plus-1_8b)
* [InternLM2-Math-Plus-7B](https://huggingface.co/internlm/internlm2-math-plus-7b)
For each problem in the MathOdyssey dataset, we sampled 256 reasoning paths. Sampling was performed with temperatures ∈ {1.0, 1.1, 1.3} to explore diverse reasoning trajectories.
## Structure of each JSON
The JSON structure is illustrated below with an example of 3 samples per problem across 2 problems:
```json
{
"predict": [ // 2D string array: [problems][samples]
["Prediction #1 for Problem 1", "Prediction #2 for Problem 1", "Prediction #3 for Problem 1"],
["Prediction #1 for Problem 2", "Prediction #2 for Problem 2", "Prediction #3 for Problem 2"]
],
"answer": [ // Ground truth answers
"Answer for Problem 1", "Answer for Problem 2"
],
"completion": [ // 2D string array: [problems][samples]
["Completion #1 for Problem 1", "Completion #2 for Problem 1", "Completion #3 for Problem 1"],
["Completion #1 for Problem 2", "Completion #2 for Problem 2", "Completion #3 for Problem 2"]
],
"cumulative_logprob": [ // Sum of log probabilities per sample
[-15.526, -12.123, -14.12],
[-20.526, -22.123, -24.12]
],
"mean_logprob": [ // Normalized log probabilities (sum / sequence length, i.e., perplexity)
[-0.070, -0.04, -0.05],
[-0.170, -0.14, -0.15]
],
"prompt": [ // Input prompts for each problem
"Prompt for Problem 1", "Prompt for Problem 2"
],
"temperature": 0, // Sampling temperature
"top_p": 1, // Nucleus sampling parameter
"accuracy": [ // 2D boolean array: [samples][problems]
[false, true],
[true, true],
[true, true]
]
}
```
## Available Files
||Deepseek-Math-RL-7B|InternLM2-Math-Plus-7B|InternLM2-Math-Plus-1.8B|
|:--:|:--|:--|:--|
|T=1.0|`Deepseek-Math-RL-7B.json`|`InternLM2-Math-Plus-7B.json`|`InternLM2-Math-Plus-1.8B.json`|
|T=1.1|`Deepseek-Math-RL-7B-T=1.1.json`|`InternLM2-Math-Plus-7B-T=1.1.json`|`InternLM2-Math-Plus-1.8B-T=1.1.json`|
|T=1.3|`Deepseek-Math-RL-7B-T=1.3.json`|`InternLM2-Math-Plus-7B-T=1.3.json`|`InternLM2-Math-Plus-1.8B-T=1.3.json`|
## Citation
If you use this dataset in your research, please cite:
```bibtex
@inproceedings{zhou24theoretical,
author = {Zhou, Zhi and Tan, Yuhao and Li, Zenan and Yao, Yuan and Guo, Lan-Zhe and Li, Yu-Feng and Ma, Xiaoxing},
title = {A Theorecial Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
}
```
# MathOdyssey数据集采样推理路径
本数据集为MathOdyssey数据集的采样推理路径集合,作为NeurIPS 2025论文《关于衔接大语言模型推理内部概率与自一致性的理论研究》(链接:https://wnjxyk.github.io/RPC)的附属资源发布。MathOdyssey数据集的官方链接为:https://huggingface.co/datasets/MathOdyssey/MathOdyssey。
## 概述
我们针对MathOdyssey数据集中的每一道题目,使用3款数学大语言模型(LLM)生成了多条推理路径,分别为:
* Deepseek-Math-RL-7B(模型链接:https://huggingface.co/deepseek-ai/deepseek-math-7b-rl)
* InternLM2-Math-Plus-1.8B(模型链接:https://huggingface.co/internlm/internlm2-math-plus-1_8b)
* InternLM2-Math-Plus-7B(模型链接:https://huggingface.co/internlm/internlm2-math-plus-7b)
针对每道题目,我们均采样了256条推理路径。采样时设置的温度参数取值为{1.0、1.1、1.3},以探索多样化的推理轨迹。
## 单个JSON文件的结构
以下示例展示了单JSON文件的结构,该示例包含2道题目,每道题目对应3条采样路径:
json
{
"predict": [ // 二维字符串数组:[题目数量][采样数量]
["题目1的预测结果#1", "题目1的预测结果#2", "题目1的预测结果#3"],
["题目2的预测结果#1", "题目2的预测结果#2", "题目2的预测结果#3"]
],
"answer": [ // 题目标准答案
"题目1的标准答案", "题目2的标准答案"
],
"completion": [ // 二维字符串数组:[题目数量][采样数量]
["题目1的采样推理完成文本#1", "题目1的采样推理完成文本#2", "题目1的采样推理完成文本#3"],
["题目2的采样推理完成文本#1", "题目2的采样推理完成文本#2", "题目2的采样推理完成文本#3"]
],
"cumulative_logprob": [ // 每条采样路径的对数概率总和
[-15.526, -12.123, -14.12],
[-20.526, -22.123, -24.12]
],
"mean_logprob": [ // 归一化对数概率(总和除以序列长度,即困惑度)
[-0.070, -0.04, -0.05],
[-0.170, -0.14, -0.15]
],
"prompt": [ // 每道题目的输入提示词
"题目1的输入提示词", "题目2的输入提示词"
],
"temperature": 0, // 采样温度参数
"top_p": 1, // 核心采样(top-p)参数
"accuracy": [ // 二维布尔数组:[采样数量][题目数量]
[false, true],
[true, true],
[true, true]
]
}
## 可用文件
以下为各模型、各温度参数对应的数据集文件:
| |Deepseek-Math-RL-7B|InternLM2-Math-Plus-7B|InternLM2-Math-Plus-1.8B|
|:--:|:--|:--|:--|
|T=1.0|`Deepseek-Math-RL-7B.json`|`InternLM2-Math-Plus-7B.json`|`InternLM2-Math-Plus-1.8B.json`|
|T=1.1|`Deepseek-Math-RL-7B-T=1.1.json`|`InternLM2-Math-Plus-7B-T=1.1.json`|`InternLM2-Math-Plus-1.8B-T=1.1.json`|
|T=1.3|`Deepseek-Math-RL-7B-T=1.3.json`|`InternLM2-Math-Plus-7B-T=1.3.json`|`InternLM2-Math-Plus-1.8B-T=1.3.json`|
## 引用方式
若您在研究工作中使用本数据集,请引用以下文献:
bibtex
@inproceedings{zhou24theoretical,
author = {Zhou, Zhi and Tan, Yuhao and Li, Zenan and Yao, Yuan and Guo, Lan-Zhe and Li, Yu-Feng and Ma, Xiaoxing},
title = {A Theorecial Study on Bridging Internal Probability and Self-Consistency for LLM Reasoning},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
}
提供机构:
maas
创建时间:
2025-10-10



