five

Dhanishtha-2.0-MATHS

收藏
魔搭社区2025-12-05 更新2025-07-12 收录
下载链接:
https://modelscope.cn/datasets/HelpingAI/Dhanishtha-2.0-MATHS
下载链接
链接失效反馈
官方服务:
资源简介:
📐 **Dhanishtha-2.0-MATHS** An expertly curated set of **36.7K math reasoning samples** directly sourced from internal training data used in **Dhanishtha-2.0**, the world’s first LLM trained on Intermediate Thinking (IT). Each sample integrates multi-phase mathematical reasoning, showcasing robust problem decomposition, conceptual insight, and self-correction. --- ### 📊 Overview - **36.7K mathematical prompts** and solutions - Format: **Instruction → Output**, ideal for fine-tuning - Embedded `<think>` blocks for recursive reasoning - Covers domains like: algebra, geometry, arithmetic, combinatorics, logic, number theory, statistics, and word problems - Clean formatting for symbolic math, LaTeX expressions, and textual explanations - Primarily in English --- ### ⚙️ Intermediate Thinking for Mathematics Rather than generating single-pass solutions, each response features **layered problem solving** and **self-awareness of errors**, enabling learners and models to appreciate the entire process—not just the answer. ```plaintext <think> I recognize this as a modular arithmetic problem involving even spacing. Let me first interpret the repeating pattern and identify how it defines a polynomial. </think> The values repeat every 3 inputs: (2, 1, 0), which hints at a cyclical polynomial or a periodic system. <think> Let me test values for small n and see whether a cubic polynomial can satisfy these constraints and match the given final condition. </think> ... ``` ✅ Encourages **correction mid-solution** 🔁 Supports **step-by-step roughwork decomposition** 📚 Allows **parallel reasoning paths for ambiguity** --- ### 🧮 Sample Content Each example follows a structured progression: 1. Prompt with a math problem 2. Initial reasoning block: `<think>` exploration 3. One or more intermediate refinements or corrections 4. Optional `<ser>` blocks for emotional framing 5. Final synthesized answer with justification --- ### 🔍 Sample Format ```json { "instruction": "If 991 + 993 + 995 + 997 + 999 = 5000 - N, find N.", "output": "<think>\nThis is a sum of five consecutive odd numbers. I'll compute it using the arithmetic series formula.\n</think>\n\nSum = (5/2)*(991 + 999) = (5/2)*1990 = 5*995 = 4975\n\n<think>\nNow, 5000 - N = 4975 → N = 25\n</think>\n\nAnswer: N = 25" } ``` --- ### 🧠 Emotional Reasoning Blocks (Optional) Some samples feature structured `<ser>` blocks to map emotional framing during reasoning: ```plaintext <ser> Emotion ==> determination Cause ==> competitive math contest Mind ==> focused and methodical Growth ==> confidence-building through precision </ser> ``` --- ### 🧪 Data Collection - **Source**: Derived from Dhanishtha-2.0’s internal training run - **Sampling**: Uniformly selected subset (~0.15%) of its mathematics corpus - **Filtering**: Ensured syntactic correctness of LaTeX and `<think>` tags - **Languages**: ~93% English, rest multilingual (hi, ta, te, fr, ja, zh, es) --- ### 🧪 Validation & Processing - Manual pass through 1,000 entries for reasoning trace clarity - Heuristic validation of math expressions - Deduplication of variants and aliases --- ### 🔧 Quickstart ```python from datasets import load_dataset dataset = load_dataset("HelpingAI/Dhanishtha-2.0-MATHS", split="train") for row in dataset: print(row["instruction"]) print(row["output"]) ``` --- ### 🧪 Intended Use - Finetuning math-specialized or IT-enabled LLMs - Training models for math olympiad or competitive reasoning - Benchmarking multi-step solution quality - Enhancing models with “show your work” capabilities --- ### 📄 Citation ```bibtex @misc{HAI2025dhanishthaMATHS, title = {Dhanishtha-2.0-MATHS: Multi-phase Mathematical Reasoning Dataset with Intermediate Thinking}, author = {Abhay Koul and Varun Gupta}, year = {2025}, publisher = {HelpingAI}, howpublished = {\url{https://huggingface.co/datasets/HelpingAI/Dhanishtha-2.0-MATHS}} } ``` --- 🧠 *Math isn't just about answers—it's a journey of thought. Dhanishtha-2.0-MATHS invites models to reason, reflect, and evolve.*

📐 **Dhanishtha-2.0-MATHS** 这是一套精心精选的**36.7万个数学推理样本集**,其数据直接源自首款基于中间思维(Intermediate Thinking, IT)训练的大语言模型(Large Language Model, LLM)**Dhanishtha-2.0**的内部训练数据。每一条样本均整合了多阶段数学推理过程,展现了出色的问题拆解、概念洞察与自我修正能力。 --- ### 📊 数据集概览 - **36.7万个数学提示与解答** - 格式:**指令 → 输出**,非常适合微调(fine-tuning) - 嵌入了用于递归推理的`<think>`代码块 - 覆盖的领域包括:代数、几何、算术、组合数学、逻辑学、数论、统计学,以及应用题 - 针对符号数学、LaTeX表达式与文本解释做了格式化优化 - 主要语言为英语 --- ### ⚙️ 数学中间思维框架 不同于单步生成的解答,每一条回复均具备**分层问题求解**与**错误自我觉察**能力,使学习者与模型能够完整理解推理全过程——而非仅关注最终答案。 plaintext <think> I recognize this as a modular arithmetic problem involving even spacing. Let me first interpret the repeating pattern and identify how it defines a polynomial. </think> The values repeat every 3 inputs: (2, 1, 0), which hints at a cyclical polynomial or a periodic system. <think> Let me test values for small n and see whether a cubic polynomial can satisfy these constraints and match the given final condition. </think> ... ✅ 鼓励**解题中途的自我修正** 🔁 支持**分步草稿式问题拆解** 📚 允许**针对歧义场景的并行推理路径** --- ### 🧮 样本内容结构 每条示例均遵循标准化流程: 1. 包含数学问题的提示文本 2. 初始推理模块:由`<think>`包裹的探索过程 3. 一个或多个中间优化或修正步骤 4. 可选的`<ser>`模块,用于标注推理过程中的情绪框架 5. 最终整合的答案与论证过程 --- ### 🔍 样本格式 json { "instruction": "若 991 + 993 + 995 + 997 + 999 = 5000 - N,求 N 的值。", "output": "<think> 本题为五个连续奇数的求和问题,我将使用等差数列求和公式进行计算。 </think> 求和结果 = (5/2)*(991 + 999) = (5/2)*1990 = 5*995 = 4975 <think> 由 5000 - N = 4975 可得 N = 25 </think> 答案:N = 25" } --- ### 🧠 可选情绪推理模块 部分样本包含结构化的`<ser>`模块,用于标注推理过程中的情绪框架: plaintext <ser> 情绪 ==> 坚定 成因 ==> 参与数学竞赛 思维状态 ==> 专注且条理清晰 成长收获 ==> 通过严谨性训练建立自信心 </ser> --- ### 🧪 数据采集 - **数据来源**:源自Dhanishtha-2.0的内部训练流程 - **采样方式**:从其数学语料库中均匀选取的子集(约占0.15%) - **筛选规则**:确保LaTeX语法与`<think>`标签的句法正确性 - **语言分布**:约93%为英语,其余为多语言(印地语、泰米尔语、泰卢固语、法语、日语、中文、西班牙语) --- ### 🧪 验证与预处理 - 对1000条样本进行人工审核,确保推理轨迹清晰可读 - 对数学表达式进行启发式验证 - 对变体与别名进行去重处理 --- ### 🔧 快速上手 python from datasets import load_dataset dataset = load_dataset("HelpingAI/Dhanishtha-2.0-MATHS", split="train") for row in dataset: print(row["instruction"]) print(row["output"]) --- ### 🧪 预期用途 - 针对数学专项或具备中间思维能力的大语言模型进行微调 - 训练具备数学奥林匹克或竞赛级推理能力的模型 - 对多步解答的质量进行基准测试 - 提升模型的“展示解题过程”能力 --- ### 📄 引用格式 bibtex @misc{HAI2025dhanishthaMATHS, title = {Dhanishtha-2.0-MATHS: Multi-phase Mathematical Reasoning Dataset with Intermediate Thinking}, author = {Abhay Koul and Varun Gupta}, year = {2025}, publisher = {HelpingAI}, howpublished = {url{https://huggingface.co/datasets/HelpingAI/Dhanishtha-2.0-MATHS}} } --- 🧠 *数学并非仅关乎答案——它是一场思维的旅程。Dhanishtha-2.0-MATHS邀请模型展开推理、反思并进化。*
提供机构:
maas
创建时间:
2025-07-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作