backtranslated-tir
收藏魔搭社区2026-01-02 更新2025-11-08 收录
下载链接:
https://modelscope.cn/datasets/camel-ai/backtranslated-tir
下载链接
链接失效反馈官方服务:
资源简介:
# Agent-Distilled Math Reasoning (TIR+CoT) Dataset
This dataset contains mathematical problems paired with both tool-integrated reasoning (TIR) traces and corresponding chain-of-thought (CoT) traces, distilled via agent-based pipelines. It is designed for fine-tuning large language models on step-by-step mathematical reasoning and tool-augmented problem solving.
## Training Data
We generate SFT data based on multiple data sources to ensure diverse and challenging coverage across mathematical domains. The initial training set consolidates examples from several established benchmarks and are collected by **ToRL** (Li et al. (2025)), including:
- **NuminaMATH** (Jia et al., 2024)
- **MATH** (Hendrycks et al., 2021)
- **DeepScaleR** (Luo et al., 2025)
To mitigate data leakage, we further clean the corpus by filtering out any training examples whose question text shares a repeated 10-gram subsequence with any question in our test sets. This deduplication step ensures a fair and reliable assessment of generalization performance. We ended up collecting 25,000 math problems in total. After the TIR trace filtering process with the Solver Agent, we obtain 11.6k TIR traces, with an overall accuracy around 46%.
## Data Format
The dataset is in `train_part.jsonl`, a JSON object with at least the following fields:
- `problem`: The text of the math problem statement.
- `TIR trace`: The tool-integrated reasoning (TIR) trace generated by the Solver Agent. This field contains the step-by-step, interleaved plan, tool calls, and intermediate reasoning as executed by the agent using external tools.
- `CoT trace`: The solution trace, rephrased from the corresponding TIR trace by the Rephrase Agent. This field provides clear, step-by-step reasoning and is suitable for use as the target in supervised fine-tuning.
## Intended Use
- **Supervised Fine-Tuning**: The `problem` field can be used as the model input and `CoT trace` as the model output for training models to solve math problems with detailed reasoning.
- **Evaluation**: The dataset can also be used to benchmark reasoning and solution generation capabilities of language models.
## Example Entry
```json
{
"problem": "For each positive number x, let f(x) = ((x + 1/x)^6 - (x^6 + 1/x^6) - 2) / ((x + 1/x)^3 + (x^3 + 1/x^3)). Find the minimum value of f(x).",
"TIR trace": "<message>\nStep-by-step plan:\n\n1. Simplify the expression: Use algebraic manipulation to rewrite the numerator and denominator in terms of a simpler variable, such as t = x + 1/x.\n2. Express powers in terms of t: Use the binomial theorem or known identities to express (x + 1/x)^n and x^n + 1/x^n in terms of t.\n3. Rewrite f(x) as a function of t: After substitution, f(x) becomes a rational function f(t).\n4. Find the minimum value of f(t): Use calculus (derivative) or algebraic methods to find critical points of f(t) in the domain t ≥ 2.\n\n...\n\nFinal answer: The minimum value is 6.\n</message>",
"CoT trace": "Step 1: Introduce the substitution t = x + 1/x. Step 2: Express powers in terms of t using identities. Step 3: Rewrite f(x) in terms of t and simplify. Step 4: Analyze f(t) and find the minimum. Final answer: 6."
}
```
# 智能体蒸馏数学推理(工具集成推理+思维链)数据集
本数据集包含数学题目,以及通过智能体流水线蒸馏得到的工具集成推理(Tool-Integrated Reasoning, TIR)轨迹与对应的思维链(Chain-of-Thought, CoT)轨迹,旨在针对逐步数学推理与工具辅助解题场景对大语言模型(Large Language Model)进行微调。
## 训练数据
我们基于多数据源生成监督微调(Supervised Fine-Tuning, SFT)数据,以确保覆盖多样且具有挑战性的数学领域题目。初始训练集整合了多个成熟基准数据集的示例,由**ToRL**(Li等人,2025)收集,包括:
- **NuminaMATH**(Jia等人,2024)
- **MATH**(Hendrycks等人,2021)
- **DeepScaleR**(Luo等人,2025)
为缓解数据泄露问题,我们进一步对语料进行清洗:过滤掉所有与测试集题目存在重复10元子序列的训练示例。该去重步骤可确保对模型泛化性能的评估公平且可靠。最终我们共收集到25000道数学题目。经过求解智能体(Solver Agent)的TIR轨迹过滤流程后,我们得到11.6k条TIR轨迹,整体准确率约为46%。
## 数据格式
本数据集存储为`train_part.jsonl`格式,为JSON对象,至少包含以下字段:
- `problem`:数学题目的题干文本。
- `TIR trace`:由求解智能体生成的工具集成推理轨迹。该字段包含智能体使用外部工具执行的逐步交错式规划、工具调用与中间推理过程。
- `CoT trace`:由重写智能体(Rephrase Agent)基于对应TIR轨迹重写得到的解题轨迹。该字段提供清晰的逐步推理过程,适合作为监督微调的训练目标。
## 预期用途
- **监督微调**:可将`problem`字段作为模型输入,`CoT trace`作为模型输出,用于训练具备详细推理能力的数学解题模型。
- **评测**:本数据集也可用于基准测试语言模型的推理与解题生成能力。
## 示例条目
json
{
"problem": "For each positive number x, let f(x) = ((x + 1/x)^6 - (x^6 + 1/x^6) - 2) / ((x + 1/x)^3 + (x^3 + 1/x^3)). Find the minimum value of f(x).",
"TIR trace": "<message>
Step-by-step plan:
1. Simplify the expression: Use algebraic manipulation to rewrite the numerator and denominator in terms of a simpler variable, such as t = x + 1/x.
2. Express powers in terms of t: Use the binomial theorem or known identities to express (x + 1/x)^n and x^n + 1/x^n in terms of t.
3. Rewrite f(x) as a function of t: After substitution, f(x) becomes a rational function f(t).
4. Find the minimum value of f(t): Use calculus (derivative) or algebraic methods to find critical points of f(t) in the domain t ≥ 2.
...
Final answer: The minimum value is 6.
</message>",
"CoT trace": "Step 1: Introduce the substitution t = x + 1/x. Step 2: Express powers in terms of t using identities. Step 3: Rewrite f(x) in terms of t and simplify. Step 4: Analyze f(t) and find the minimum. Final answer: 6."
}
提供机构:
maas
创建时间:
2025-09-04



