GPT-OSS-120B-Distilled-Reasoning-math
收藏魔搭社区2026-01-08 更新2025-09-06 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/GPT-OSS-120B-Distilled-Reasoning-math
下载链接
链接失效反馈官方服务:
资源简介:

# GPT-oss-120B-Distilled-Reasoning-math Dataset
**Data Source Model**: **gpt-oss-120b**
**Task Type**: Mathematical Problem Solving
**Data Format**: JSON Lines
**Fields**: Generator, Category, Input, CoT_Native_Reasoning, Reasoning, Answer
---
# Core Statistics
Generated complete reasoning processes and answers using **gpt-oss-120b** (MXFP4).
The text length of the dataset reflects the depth and complexity of its content. I have statistically analyzed the lengths of the **input** (question), **Reasoning**, and **Answer**.
To understand the data distribution more intuitively, I performed some visualization analysis.

---
## Quality and Content Evaluation
This evaluation did not introduce an LLM scoring model. Instead, two custom quantitative metrics were used to assess data structure and reasoning characteristics:
- **Reasoning Complexity Ratio**: **39.19**
*Calculation Method*: Average reasoning characters ÷ Average input characters
*Meaning*: Measures the extent of the model's reasoning chain. A higher value means the model provides sufficient reasoning details even for short questions.
- **Answer Efficiency Ratio**: **0.67**
*Calculation Method*: Average answer words ÷ Average reasoning words
*Meaning*: Measures the refinement from reasoning to the answer. A lower value indicates that the reasoning is divergent, while the answer is convergent and concise.


---
## Comprehensive Evaluation
The dataset demonstrates high-quality mathematical problem-solving capabilities, featuring:
- **Comprehensive Reasoning Chain**: Detailed thought processes and clear logical steps.
- **Rich Mathematical Expression**: Effective use of LaTeX for formula typesetting.
- **Balanced Input-Output Relationship**: The complexity of the reasoning process is reasonably correlated with the complexity of the problem.
---
## Dataset Structure
**File Format**: .jsonl (one sample per line, independent JSON object)
To make training easier for everyone, I have prepared various data structure templates, offering three common annotation types for different distillation and cleaning logic.
**1. Standard JSON Structure**
To facilitate the training of reasoning models or the creation of SFT data, explicitly separate the chain of thought and the final answer in the output.
```json
{
"generator": "gpt-oss-120b",
"category": "math",
"Input": "Given that 2^x = 8, find x.",
"CoT_Native_Reasoning": "We note that 8 = 2^3...",
"answer": "The answer is 3."
}
```
**2. OpenAI Harmony**
Messages enclosed by tags <start>|user|message|...<end> and <start>|assistant|...<end>, aligning with the OpenAI Harmony style.
```json
{
"generator": "gpt-oss-120b",
"category": "math",
"Input": "<start>|user|message|>In triangle ABC with BC=3, ... <end>",
"output": "<start>|assistant|We have a right triangle at C, ... <end>"
}
```
**3. Think**
The format is like the Qwen3 series model and DeepSeek.
```json
{
"generator": "gpt-oss-120b",
"category": "math",
"Input": "Solve: If 12x = 36, what is x?",
"output": "[think]First, divide both sides by 12. 36 / 12 = 3. So x = 3.[/think] The answer is 3."
}
```
---
## Training and Usage Recommendations
- **Alignment Training**: For CoT training, please ensure the template labels are suitable for the model.
- **Evaluation**: Report reasoning accuracy with/without CoT simultaneously; provide an "answer-in-the-box" parser to stabilize numerical extraction.
- **Safety Thresholds**: Prioritize quality over quantity for erroneous/inconsistent samples; set safety upper bounds for long samples and process them in chunks.
---
## Acknowledgements
The construction of this dataset is based on the generation capabilities of **gpt-oss-120b** and the optimized design of mathematical reasoning templates.
Special thanks to the open-source community for their contributions in **mathematical expression formatting**, **data cleaning scripts**, and **visualization analysis**.
**Seed Questions**: Derived in part from *nvidia/Nemotron-Post-Training-Dataset-v1*.
**License**: CC-BY-4.0
**Dataset Citation**:
```
@dataset{jackrong_2025_gpt_oss_math_distill,
title = {GPT-OSS-120B-Distilled-Reasoning-math},
author = {Jackrong},
year = {2025},
url = {https://huggingface.co/datasets/Jackrong/GPT-OSS-120B-Distilled-Reasoning-math}
}
```
---
# 📚 数据集概览
数据源模型: **gpt-oss-120b**
任务类型: **Mathematical Problem Solving**
数据格式: **JSON Lines (.jsonl)**
字段: **Generator, Category, Input, CoT_Native_Reasoning, Reasoning, Answer**
---
## 📈 核心统计指标
使用 **gpt-oss-120b**(MXFP4 格式)生成完整的推理过程与答案。
数据集的文本长度反映了其内容的深度和复杂性。我对输入(问题)、Reasoning 和 Answer 的长度进行了详细统计。
为了更直观地理解数据分布,我进行了可视化分析。

---
## ⭐ 质量与内容评估
本次没有引入 LLM 评分模型,而是使用两项自定义量化指标评估数据结构与推理特性:
- **推理复杂度比率**(Reasoning Complexity Ratio):39.19
*计算方式*:平均推理字符数 ÷ 平均输入字符数
*含义*:衡量模型推理链的展开程度。较高值意味着即使面对简短题目,模型也能提供充分的推理细节。
- **答案效率比率**(Answer Efficiency Ratio):0.67
*计算方式*:平均答案词数 ÷ 平均推理词数
*含义*:衡量推理到答案的精炼程度。较低值代表推理是发散的,而答案是收敛简洁的。


---
## ✅ 综合评估
数据集展现了高质量的数学问题解决能力,具有:
- **全面的推理链**:思维过程详尽,逻辑步骤清晰。
- **丰富的数学表达**:能够有效利用 LaTeX 进行公式排版。
- **均衡的输入输出关系**:推理过程的复杂性与问题的复杂性合理相关。
---
## 🏗️ 数据集结构
**文件格式**:.jsonl(每行一个样本,独立 JSON 对象)
**示例**:
**1. Standard JSON Structure**
To facilitate the training of reasoning models or the creation of SFT data, explicitly separate the chain of thought and the final answer in the output.
```json
{
"generator": "gpt-oss-120b",
"category": "math",
"Input": "Given that 2^x = 8, find x.",
"CoT_Native_Reasoning": "We note that 8 = 2^3...",
"answer": "The answer is 3."
}
```
**2. OpenAI Harmony**
Messages enclosed by tags <start>|user|message|...<end> and <start>|assistant|...<end>, aligning with the OpenAI Harmony style.
```json
{
"generator": "gpt-oss-120b",
"category": "math",
"Input": "<start>|user|message|>In triangle ABC with BC=3, ... <end>",
"output": "<start>|assistant|>We have a right triangle at C, ... <end>"
}
```
**3.Think**
Using |think|and |/think| package reasoning content and add answer behind it directly.
The format is like The Qwen3 series model and DeepSeek.
```json
{
"generator": "gpt-oss-120b",
"category": "math",
"Input": "Solve: If 12x = 36, what is x?",
"output": "<think>First, divide both sides by 12. 36 / 12 = 3. So x = 3.</think> The answer is 3."
}
```
---
## 📌 训练与使用建议
- **对齐训练**:CoT 训练请确保模板标签适合模型。
- **评测**:同时报告含/不含 CoT 的推理正确率;提供“盒中答案”解析器以稳定提取数值。
- **安全阈**:错误/不一致样本宁缺毋滥;对长样本设置安全上限并分块处理。
---
## 🙏 致谢
本数据集的构建基于 **gpt-oss-120b** 的生成能力以及数学推理模板的优化设计。
特别感谢开源社区在 **数学公式排版**、**数据清洗脚本** 和 **可视化分析** 方面的贡献与支持。
**种子问题来源**:部分来自 *nvidia/Nemotron-Post-Training-Dataset-v1*。
**许可协议**:CC-BY-4.0
**数据集引用**:
```
@dataset{jackrong_2025_gpt_oss_math_distill,
title = {GPT-OSS-120B-Distilled-Reasoning-math},
author = {Jackrong},
year = {2025},
url = {https://huggingface.co/datasets/Jackrong/GPT-OSS-120B-Distilled-Reasoning-math}
}

# GPT-oss-120B-Distilled-Reasoning-math 数据集
**数据源模型**: **gpt-oss-120b**
**任务类型**: 数学问题求解(Mathematical Problem Solving)
**数据格式**: JSON Lines
**字段**: 生成器(Generator)、类别(Category)、输入(Input)、原生思维链推理(CoT_Native_Reasoning)、推理(Reasoning)、答案(Answer)
---
# 核心统计
本数据集通过**gpt-oss-120b**(MXFP4 精度格式)生成完整的推理过程与答案。数据集的文本长度反映了其内容的深度与复杂度,笔者已对输入(问题)、推理(Reasoning)及答案(Answer)的长度开展统计分析。为更直观地呈现数据分布,笔者还进行了可视化分析。

---
## 质量与内容评估
本次评估未引入大语言模型(LLM)评分模型,而是采用两项自定义量化指标对数据结构与推理特性进行评估:
- **推理复杂度比率**(Reasoning Complexity Ratio): **39.19**
*计算方式*: 平均推理字符数 ÷ 平均输入字符数
*含义*: 用于衡量模型推理链的展开程度。数值越高,代表即便面对简短题目,模型也能提供充分的推理细节。
- **答案效率比率**(Answer Efficiency Ratio): **0.67**
*计算方式*: 平均答案词数 ÷ 平均推理词数
*含义*: 用于衡量从推理到答案的精炼程度。数值越低,代表推理过程发散,而答案收敛且简洁。


---
## 综合评估
本数据集展现出高质量的数学问题求解能力,具体特性如下:
- **完备的推理链**: 思维过程详尽,逻辑步骤清晰。
- **丰富的数学表达**: 可有效使用LaTeX进行公式排版。
- **均衡的输入输出关系**: 推理过程的复杂度与问题复杂度呈现合理的相关性。
---
## 数据集结构
**文件格式**: .jsonl(每行一个样本,独立 JSON 对象)
为便于更多用户开展模型训练,笔者准备了多种数据结构模板,针对不同的蒸馏与清洗逻辑提供三种常见的标注类型。
**1. 标准JSON结构**
为便于推理模型训练或监督微调(Supervised Fine-Tuning, SFT)数据构建,输出中需明确分离思维链与最终答案。
json
{
"generator": "gpt-oss-120b",
"category": "math",
"Input": "Given that 2^x = 8, find x.",
"CoT_Native_Reasoning": "We note that 8 = 2^3...",
"answer": "The answer is 3."
}
**2. OpenAI Harmony格式**
采用`<start>|user|message|...<end>`与`<start>|assistant|...<end>`标签包裹对话内容,与OpenAI Harmony格式对齐。
json
{
"generator": "gpt-oss-120b",
"category": "math",
"Input": "<start>|user|message|>In triangle ABC with BC=3, ... <end>",
"output": "<start>|assistant|>We have a right triangle at C, ... <end>"
}
**3. Think 格式**
该格式适配通义千问3(Qwen3)系列模型与深度求索(DeepSeek)模型的推理输出格式,通过`<think>`与`</think>`标签包裹推理内容,并直接在其后添加最终答案。
json
{
"generator": "gpt-oss-120b",
"category": "math",
"Input": "Solve: If 12x = 36, what is x?",
"output": "<think>First, divide both sides by 12. 36 / 12 = 3. So x = 3.</think> The answer is 3."
}
---
## 训练与使用建议
- **对齐训练**: 若开展思维链(CoT)训练,请确保模板标签适配目标模型。
- **评测**: 需同时报告含/不含思维链的推理准确率;提供“盒中答案”解析器以稳定提取数值结果。
- **安全阈值**: 针对错误或不一致的样本,应遵循宁缺毋滥的原则;对长样本设置安全上限,并采用分块方式处理。
---
## 致谢
本数据集的构建依托**gpt-oss-120b**的生成能力与数学推理模板的优化设计。
特别感谢开源社区在**数学公式排版**、**数据清洗脚本**及**可视化分析**方面的贡献。
**种子问题来源**: 部分源自 *nvidia/Nemotron-Post-Training-Dataset-v1*。
**许可协议**: CC-BY-4.0
**数据集引用格式**:
@dataset{jackrong_2025_gpt_oss_math_distill,
title = {GPT-OSS-120B-Distilled-Reasoning-math},
author = {Jackrong},
year = {2025},
url = {https://huggingface.co/datasets/Jackrong/GPT-OSS-120B-Distilled-Reasoning-math}
}
提供机构:
maas
创建时间:
2025-08-18



