Dhanishtha-2.0-MATHS
收藏魔搭社区2025-12-05 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/HelpingAI/Dhanishtha-2.0-MATHS
下载链接
链接失效反馈官方服务:
资源简介:
📐 **Dhanishtha-2.0-MATHS**
An expertly curated set of **36.7K math reasoning samples** directly sourced from internal training data used in **Dhanishtha-2.0**, the world’s first LLM trained on Intermediate Thinking (IT). Each sample integrates multi-phase mathematical reasoning, showcasing robust problem decomposition, conceptual insight, and self-correction.
---
### 📊 Overview
- **36.7K mathematical prompts** and solutions
- Format: **Instruction → Output**, ideal for fine-tuning
- Embedded `<think>` blocks for recursive reasoning
- Covers domains like: algebra, geometry, arithmetic, combinatorics, logic, number theory, statistics, and word problems
- Clean formatting for symbolic math, LaTeX expressions, and textual explanations
- Primarily in English
---
### ⚙️ Intermediate Thinking for Mathematics
Rather than generating single-pass solutions, each response features **layered problem solving** and **self-awareness of errors**, enabling learners and models to appreciate the entire process—not just the answer.
```plaintext
<think>
I recognize this as a modular arithmetic problem involving even spacing. Let me first interpret the repeating pattern and identify how it defines a polynomial.
</think>
The values repeat every 3 inputs: (2, 1, 0), which hints at a cyclical polynomial or a periodic system.
<think>
Let me test values for small n and see whether a cubic polynomial can satisfy these constraints and match the given final condition.
</think>
...
```
✅ Encourages **correction mid-solution**
🔁 Supports **step-by-step roughwork decomposition**
📚 Allows **parallel reasoning paths for ambiguity**
---
### 🧮 Sample Content
Each example follows a structured progression:
1. Prompt with a math problem
2. Initial reasoning block: `<think>` exploration
3. One or more intermediate refinements or corrections
4. Optional `<ser>` blocks for emotional framing
5. Final synthesized answer with justification
---
### 🔍 Sample Format
```json
{
"instruction": "If 991 + 993 + 995 + 997 + 999 = 5000 - N, find N.",
"output": "<think>\nThis is a sum of five consecutive odd numbers. I'll compute it using the arithmetic series formula.\n</think>\n\nSum = (5/2)*(991 + 999) = (5/2)*1990 = 5*995 = 4975\n\n<think>\nNow, 5000 - N = 4975 → N = 25\n</think>\n\nAnswer: N = 25"
}
```
---
### 🧠 Emotional Reasoning Blocks (Optional)
Some samples feature structured `<ser>` blocks to map emotional framing during reasoning:
```plaintext
<ser>
Emotion ==> determination
Cause ==> competitive math contest
Mind ==> focused and methodical
Growth ==> confidence-building through precision
</ser>
```
---
### 🧪 Data Collection
- **Source**: Derived from Dhanishtha-2.0’s internal training run
- **Sampling**: Uniformly selected subset (~0.15%) of its mathematics corpus
- **Filtering**: Ensured syntactic correctness of LaTeX and `<think>` tags
- **Languages**: ~93% English, rest multilingual (hi, ta, te, fr, ja, zh, es)
---
### 🧪 Validation & Processing
- Manual pass through 1,000 entries for reasoning trace clarity
- Heuristic validation of math expressions
- Deduplication of variants and aliases
---
### 🔧 Quickstart
```python
from datasets import load_dataset
dataset = load_dataset("HelpingAI/Dhanishtha-2.0-MATHS", split="train")
for row in dataset:
print(row["instruction"])
print(row["output"])
```
---
### 🧪 Intended Use
- Finetuning math-specialized or IT-enabled LLMs
- Training models for math olympiad or competitive reasoning
- Benchmarking multi-step solution quality
- Enhancing models with “show your work” capabilities
---
### 📄 Citation
```bibtex
@misc{HAI2025dhanishthaMATHS,
title = {Dhanishtha-2.0-MATHS: Multi-phase Mathematical Reasoning Dataset with Intermediate Thinking},
author = {Abhay Koul and Varun Gupta},
year = {2025},
publisher = {HelpingAI},
howpublished = {\url{https://huggingface.co/datasets/HelpingAI/Dhanishtha-2.0-MATHS}}
}
```
---
🧠 *Math isn't just about answers—it's a journey of thought. Dhanishtha-2.0-MATHS invites models to reason, reflect, and evolve.*
📐 **Dhanishtha-2.0-MATHS**
这是一套精心精选的**36.7万个数学推理样本集**,其数据直接源自首款基于中间思维(Intermediate Thinking, IT)训练的大语言模型(Large Language Model, LLM)**Dhanishtha-2.0**的内部训练数据。每一条样本均整合了多阶段数学推理过程,展现了出色的问题拆解、概念洞察与自我修正能力。
---
### 📊 数据集概览
- **36.7万个数学提示与解答**
- 格式:**指令 → 输出**,非常适合微调(fine-tuning)
- 嵌入了用于递归推理的`<think>`代码块
- 覆盖的领域包括:代数、几何、算术、组合数学、逻辑学、数论、统计学,以及应用题
- 针对符号数学、LaTeX表达式与文本解释做了格式化优化
- 主要语言为英语
---
### ⚙️ 数学中间思维框架
不同于单步生成的解答,每一条回复均具备**分层问题求解**与**错误自我觉察**能力,使学习者与模型能够完整理解推理全过程——而非仅关注最终答案。
plaintext
<think>
I recognize this as a modular arithmetic problem involving even spacing. Let me first interpret the repeating pattern and identify how it defines a polynomial.
</think>
The values repeat every 3 inputs: (2, 1, 0), which hints at a cyclical polynomial or a periodic system.
<think>
Let me test values for small n and see whether a cubic polynomial can satisfy these constraints and match the given final condition.
</think>
...
✅ 鼓励**解题中途的自我修正**
🔁 支持**分步草稿式问题拆解**
📚 允许**针对歧义场景的并行推理路径**
---
### 🧮 样本内容结构
每条示例均遵循标准化流程:
1. 包含数学问题的提示文本
2. 初始推理模块:由`<think>`包裹的探索过程
3. 一个或多个中间优化或修正步骤
4. 可选的`<ser>`模块,用于标注推理过程中的情绪框架
5. 最终整合的答案与论证过程
---
### 🔍 样本格式
json
{
"instruction": "若 991 + 993 + 995 + 997 + 999 = 5000 - N,求 N 的值。",
"output": "<think>
本题为五个连续奇数的求和问题,我将使用等差数列求和公式进行计算。
</think>
求和结果 = (5/2)*(991 + 999) = (5/2)*1990 = 5*995 = 4975
<think>
由 5000 - N = 4975 可得 N = 25
</think>
答案:N = 25"
}
---
### 🧠 可选情绪推理模块
部分样本包含结构化的`<ser>`模块,用于标注推理过程中的情绪框架:
plaintext
<ser>
情绪 ==> 坚定
成因 ==> 参与数学竞赛
思维状态 ==> 专注且条理清晰
成长收获 ==> 通过严谨性训练建立自信心
</ser>
---
### 🧪 数据采集
- **数据来源**:源自Dhanishtha-2.0的内部训练流程
- **采样方式**:从其数学语料库中均匀选取的子集(约占0.15%)
- **筛选规则**:确保LaTeX语法与`<think>`标签的句法正确性
- **语言分布**:约93%为英语,其余为多语言(印地语、泰米尔语、泰卢固语、法语、日语、中文、西班牙语)
---
### 🧪 验证与预处理
- 对1000条样本进行人工审核,确保推理轨迹清晰可读
- 对数学表达式进行启发式验证
- 对变体与别名进行去重处理
---
### 🔧 快速上手
python
from datasets import load_dataset
dataset = load_dataset("HelpingAI/Dhanishtha-2.0-MATHS", split="train")
for row in dataset:
print(row["instruction"])
print(row["output"])
---
### 🧪 预期用途
- 针对数学专项或具备中间思维能力的大语言模型进行微调
- 训练具备数学奥林匹克或竞赛级推理能力的模型
- 对多步解答的质量进行基准测试
- 提升模型的“展示解题过程”能力
---
### 📄 引用格式
bibtex
@misc{HAI2025dhanishthaMATHS,
title = {Dhanishtha-2.0-MATHS: Multi-phase Mathematical Reasoning Dataset with Intermediate Thinking},
author = {Abhay Koul and Varun Gupta},
year = {2025},
publisher = {HelpingAI},
howpublished = {url{https://huggingface.co/datasets/HelpingAI/Dhanishtha-2.0-MATHS}}
}
---
🧠 *数学并非仅关乎答案——它是一场思维的旅程。Dhanishtha-2.0-MATHS邀请模型展开推理、反思并进化。*
提供机构:
maas
创建时间:
2025-07-07



