five

GPT-OSS-120B-Distilled-Reasoning-math

收藏
魔搭社区2026-01-08 更新2025-09-06 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/GPT-OSS-120B-Distilled-Reasoning-math
下载链接
链接失效反馈
官方服务:
资源简介:
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/Z_e-AT-WC-W1FyrvBhPS1.jpeg) # GPT-oss-120B-Distilled-Reasoning-math Dataset **Data Source Model**: **gpt-oss-120b** **Task Type**: Mathematical Problem Solving **Data Format**: JSON Lines **Fields**: Generator, Category, Input, CoT_Native_Reasoning, Reasoning, Answer --- # Core Statistics Generated complete reasoning processes and answers using **gpt-oss-120b** (MXFP4). The text length of the dataset reflects the depth and complexity of its content. I have statistically analyzed the lengths of the **input** (question), **Reasoning**, and **Answer**. To understand the data distribution more intuitively, I performed some visualization analysis. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/vhb8BN37WVllGjHrmFR1i.png) --- ## Quality and Content Evaluation This evaluation did not introduce an LLM scoring model. Instead, two custom quantitative metrics were used to assess data structure and reasoning characteristics: - **Reasoning Complexity Ratio**: **39.19** *Calculation Method*: Average reasoning characters ÷ Average input characters *Meaning*: Measures the extent of the model's reasoning chain. A higher value means the model provides sufficient reasoning details even for short questions. - **Answer Efficiency Ratio**: **0.67** *Calculation Method*: Average answer words ÷ Average reasoning words *Meaning*: Measures the refinement from reasoning to the answer. A lower value indicates that the reasoning is divergent, while the answer is convergent and concise. ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/GNkBaNsuasJ4Jal9w0ABm.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/3tQm0sHGSxeTU1ZverdQk.png) --- ## Comprehensive Evaluation The dataset demonstrates high-quality mathematical problem-solving capabilities, featuring: - **Comprehensive Reasoning Chain**: Detailed thought processes and clear logical steps. - **Rich Mathematical Expression**: Effective use of LaTeX for formula typesetting. - **Balanced Input-Output Relationship**: The complexity of the reasoning process is reasonably correlated with the complexity of the problem. --- ## Dataset Structure **File Format**: .jsonl (one sample per line, independent JSON object) To make training easier for everyone, I have prepared various data structure templates, offering three common annotation types for different distillation and cleaning logic. **1. Standard JSON Structure** To facilitate the training of reasoning models or the creation of SFT data, explicitly separate the chain of thought and the final answer in the output. ```json { "generator": "gpt-oss-120b", "category": "math", "Input": "Given that 2^x = 8, find x.", "CoT_Native_Reasoning": "We note that 8 = 2^3...", "answer": "The answer is 3." } ``` **2. OpenAI Harmony** Messages enclosed by tags <start>|user|message|...<end> and <start>|assistant|...<end>, aligning with the OpenAI Harmony style. ```json { "generator": "gpt-oss-120b", "category": "math", "Input": "<start>|user|message|>In triangle ABC with BC=3, ... <end>", "output": "<start>|assistant|We have a right triangle at C, ... <end>" } ``` **3. Think** The format is like the Qwen3 series model and DeepSeek. ```json { "generator": "gpt-oss-120b", "category": "math", "Input": "Solve: If 12x = 36, what is x?", "output": "[think]First, divide both sides by 12. 36 / 12 = 3. So x = 3.[/think] The answer is 3." } ``` --- ## Training and Usage Recommendations - **Alignment Training**: For CoT training, please ensure the template labels are suitable for the model. - **Evaluation**: Report reasoning accuracy with/without CoT simultaneously; provide an "answer-in-the-box" parser to stabilize numerical extraction. - **Safety Thresholds**: Prioritize quality over quantity for erroneous/inconsistent samples; set safety upper bounds for long samples and process them in chunks. --- ## Acknowledgements The construction of this dataset is based on the generation capabilities of **gpt-oss-120b** and the optimized design of mathematical reasoning templates. Special thanks to the open-source community for their contributions in **mathematical expression formatting**, **data cleaning scripts**, and **visualization analysis**. **Seed Questions**: Derived in part from *nvidia/Nemotron-Post-Training-Dataset-v1*. **License**: CC-BY-4.0 **Dataset Citation**: ``` @dataset{jackrong_2025_gpt_oss_math_distill, title = {GPT-OSS-120B-Distilled-Reasoning-math}, author = {Jackrong}, year = {2025}, url = {https://huggingface.co/datasets/Jackrong/GPT-OSS-120B-Distilled-Reasoning-math} } ``` --- # 📚 数据集概览 数据源模型: **gpt-oss-120b** 任务类型: **Mathematical Problem Solving** 数据格式: **JSON Lines (.jsonl)** 字段: **Generator, Category, Input, CoT_Native_Reasoning, Reasoning, Answer** --- ## 📈 核心统计指标 使用 **gpt-oss-120b**(MXFP4 格式)生成完整的推理过程与答案。 数据集的文本长度反映了其内容的深度和复杂性。我对输入(问题)、Reasoning 和 Answer 的长度进行了详细统计。 为了更直观地理解数据分布,我进行了可视化分析。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/vhb8BN37WVllGjHrmFR1i.png) --- ## ⭐ 质量与内容评估 本次没有引入 LLM 评分模型,而是使用两项自定义量化指标评估数据结构与推理特性: - **推理复杂度比率**(Reasoning Complexity Ratio):39.19 *计算方式*:平均推理字符数 ÷ 平均输入字符数 *含义*:衡量模型推理链的展开程度。较高值意味着即使面对简短题目,模型也能提供充分的推理细节。 - **答案效率比率**(Answer Efficiency Ratio):0.67 *计算方式*:平均答案词数 ÷ 平均推理词数 *含义*:衡量推理到答案的精炼程度。较低值代表推理是发散的,而答案是收敛简洁的。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/GNkBaNsuasJ4Jal9w0ABm.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/3tQm0sHGSxeTU1ZverdQk.png) --- ## ✅ 综合评估 数据集展现了高质量的数学问题解决能力,具有: - **全面的推理链**:思维过程详尽,逻辑步骤清晰。 - **丰富的数学表达**:能够有效利用 LaTeX 进行公式排版。 - **均衡的输入输出关系**:推理过程的复杂性与问题的复杂性合理相关。 --- ## 🏗️ 数据集结构 **文件格式**:.jsonl(每行一个样本,独立 JSON 对象) **示例**: **1. Standard JSON Structure** To facilitate the training of reasoning models or the creation of SFT data, explicitly separate the chain of thought and the final answer in the output. ```json { "generator": "gpt-oss-120b", "category": "math", "Input": "Given that 2^x = 8, find x.", "CoT_Native_Reasoning": "We note that 8 = 2^3...", "answer": "The answer is 3." } ``` **2. OpenAI Harmony** Messages enclosed by tags <start>|user|message|...<end> and <start>|assistant|...<end>, aligning with the OpenAI Harmony style. ```json { "generator": "gpt-oss-120b", "category": "math", "Input": "<start>|user|message|>In triangle ABC with BC=3, ... <end>", "output": "<start>|assistant|>We have a right triangle at C, ... <end>" } ``` **3.Think** Using |think|and |/think| package reasoning content and add answer behind it directly. The format is like The Qwen3 series model and DeepSeek. ```json { "generator": "gpt-oss-120b", "category": "math", "Input": "Solve: If 12x = 36, what is x?", "output": "<think>First, divide both sides by 12. 36 / 12 = 3. So x = 3.</think> The answer is 3." } ``` --- ## 📌 训练与使用建议 - **对齐训练**:CoT 训练请确保模板标签适合模型。 - **评测**:同时报告含/不含 CoT 的推理正确率;提供“盒中答案”解析器以稳定提取数值。 - **安全阈**:错误/不一致样本宁缺毋滥;对长样本设置安全上限并分块处理。 --- ## 🙏 致谢 本数据集的构建基于 **gpt-oss-120b** 的生成能力以及数学推理模板的优化设计。 特别感谢开源社区在 **数学公式排版**、**数据清洗脚本** 和 **可视化分析** 方面的贡献与支持。 **种子问题来源**:部分来自 *nvidia/Nemotron-Post-Training-Dataset-v1*。 **许可协议**:CC-BY-4.0 **数据集引用**: ``` @dataset{jackrong_2025_gpt_oss_math_distill, title = {GPT-OSS-120B-Distilled-Reasoning-math}, author = {Jackrong}, year = {2025}, url = {https://huggingface.co/datasets/Jackrong/GPT-OSS-120B-Distilled-Reasoning-math} }

![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/Z_e-AT-WC-W1FyrvBhPS1.jpeg) # GPT-oss-120B-Distilled-Reasoning-math 数据集 **数据源模型**: **gpt-oss-120b** **任务类型**: 数学问题求解(Mathematical Problem Solving) **数据格式**: JSON Lines **字段**: 生成器(Generator)、类别(Category)、输入(Input)、原生思维链推理(CoT_Native_Reasoning)、推理(Reasoning)、答案(Answer) --- # 核心统计 本数据集通过**gpt-oss-120b**(MXFP4 精度格式)生成完整的推理过程与答案。数据集的文本长度反映了其内容的深度与复杂度,笔者已对输入(问题)、推理(Reasoning)及答案(Answer)的长度开展统计分析。为更直观地呈现数据分布,笔者还进行了可视化分析。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/vhb8BN37WVllGjHrmFR1i.png) --- ## 质量与内容评估 本次评估未引入大语言模型(LLM)评分模型,而是采用两项自定义量化指标对数据结构与推理特性进行评估: - **推理复杂度比率**(Reasoning Complexity Ratio): **39.19** *计算方式*: 平均推理字符数 ÷ 平均输入字符数 *含义*: 用于衡量模型推理链的展开程度。数值越高,代表即便面对简短题目,模型也能提供充分的推理细节。 - **答案效率比率**(Answer Efficiency Ratio): **0.67** *计算方式*: 平均答案词数 ÷ 平均推理词数 *含义*: 用于衡量从推理到答案的精炼程度。数值越低,代表推理过程发散,而答案收敛且简洁。 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/GNkBaNsuasJ4Jal9w0ABm.png) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/66309bd090589b7c65950665/3tQm0sHGSxeTU1ZverdQk.png) --- ## 综合评估 本数据集展现出高质量的数学问题求解能力,具体特性如下: - **完备的推理链**: 思维过程详尽,逻辑步骤清晰。 - **丰富的数学表达**: 可有效使用LaTeX进行公式排版。 - **均衡的输入输出关系**: 推理过程的复杂度与问题复杂度呈现合理的相关性。 --- ## 数据集结构 **文件格式**: .jsonl(每行一个样本,独立 JSON 对象) 为便于更多用户开展模型训练,笔者准备了多种数据结构模板,针对不同的蒸馏与清洗逻辑提供三种常见的标注类型。 **1. 标准JSON结构** 为便于推理模型训练或监督微调(Supervised Fine-Tuning, SFT)数据构建,输出中需明确分离思维链与最终答案。 json { "generator": "gpt-oss-120b", "category": "math", "Input": "Given that 2^x = 8, find x.", "CoT_Native_Reasoning": "We note that 8 = 2^3...", "answer": "The answer is 3." } **2. OpenAI Harmony格式** 采用`<start>|user|message|...<end>`与`<start>|assistant|...<end>`标签包裹对话内容,与OpenAI Harmony格式对齐。 json { "generator": "gpt-oss-120b", "category": "math", "Input": "<start>|user|message|>In triangle ABC with BC=3, ... <end>", "output": "<start>|assistant|>We have a right triangle at C, ... <end>" } **3. Think 格式** 该格式适配通义千问3(Qwen3)系列模型与深度求索(DeepSeek)模型的推理输出格式,通过`<think>`与`</think>`标签包裹推理内容,并直接在其后添加最终答案。 json { "generator": "gpt-oss-120b", "category": "math", "Input": "Solve: If 12x = 36, what is x?", "output": "<think>First, divide both sides by 12. 36 / 12 = 3. So x = 3.</think> The answer is 3." } --- ## 训练与使用建议 - **对齐训练**: 若开展思维链(CoT)训练,请确保模板标签适配目标模型。 - **评测**: 需同时报告含/不含思维链的推理准确率;提供“盒中答案”解析器以稳定提取数值结果。 - **安全阈值**: 针对错误或不一致的样本,应遵循宁缺毋滥的原则;对长样本设置安全上限,并采用分块方式处理。 --- ## 致谢 本数据集的构建依托**gpt-oss-120b**的生成能力与数学推理模板的优化设计。 特别感谢开源社区在**数学公式排版**、**数据清洗脚本**及**可视化分析**方面的贡献。 **种子问题来源**: 部分源自 *nvidia/Nemotron-Post-Training-Dataset-v1*。 **许可协议**: CC-BY-4.0 **数据集引用格式**: @dataset{jackrong_2025_gpt_oss_math_distill, title = {GPT-OSS-120B-Distilled-Reasoning-math}, author = {Jackrong}, year = {2025}, url = {https://huggingface.co/datasets/Jackrong/GPT-OSS-120B-Distilled-Reasoning-math} }
提供机构:
maas
创建时间:
2025-08-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作