mai-ll/ipw-sft-trajectories
收藏Hugging Face2026-03-30 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/mai-ll/ipw-sft-trajectories
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: apache-2.0
task_categories:
- text-generation
- question-answering
tags:
- sft
- tool-use
- orchestrator
- trajectories
size_categories:
- 10K<n<100K
---
# IPW SFT Trajectories
Supervised fine-tuning trajectories for training tool-use orchestrator models. Each trajectory contains a multi-turn conversation where a model solves a task by selecting and using tools (calculator, code interpreter, think).
## Dataset Details
| Stat | Value |
|------|-------|
| Total trajectories (deduped, correct only) | 44,301 |
| Total trajectories (with all opus) | 44,866 |
| Categories | 2,122 |
| Avg turns per trajectory | 1.2 |
| Source dataset | GeneralThought |
### Teacher Model Breakdown
| Teacher | Count |
|---------|-------|
| Kimi-K2 (1T MoE) | 43,556 |
| Claude Sonnet (regraded) | 735 |
| Claude Opus | 10 |
### Top Categories
| Category | Count |
|----------|-------|
| High School Math | 13,245 |
| Math Olympiads | 4,041 |
| Medical Exams | 3,915 |
| Open Conversations | 2,962 |
| Text Classification | 1,551 |
| Explanation | 1,447 |
| Closed Question Answering | 1,229 |
| AIME Math | 1,174 |
| General Math | 884 |
| General Question Answering | 740 |
## Files
- **`all_correct_trajectories_deduped.jsonl`** -- 44,301 trajectories. All correct (success=true), deduplicated by sample_id across all sources.
- **`all_correct_trajectories.jsonl`** -- 44,866 trajectories. Same as above, plus all 575 Opus trajectories regardless of success (for diversity from a stronger model).
## Schema
Each line is a JSON object with:
```json
{
"sample_id": "traj_e719b342",
"conversations": [
{"role": "system", "content": "..."},
{"role": "user", "content": "How much power is dissipated..."},
{"role": "assistant", "content": "THOUGHT: ... TOOL: calculator INPUT: 5^2 * 24"},
{"role": "tool", "name": "calculator", "content": "600"},
{"role": "assistant", "content": "THOUGHT: ... FINAL_ANSWER: 600 watts"}
],
"tool_calls": [...],
"ground_truth": "600 W",
"final_answer": "600 watts of power are dissipated in the resistor.",
"success": true,
"total_energy_joules": 0.0,
"total_latency_seconds": 33.88,
"total_cost_usd": 0.0,
"total_tokens": 1675,
"num_turns": 2,
"source_dataset": "generalthought",
"category": "AC Circuits",
"teacher_model": "vllm:kimi-k2@8001",
"tools_used": ["calculator"]
}
```
## Available Tools
The orchestrator can select from: `calculator`, `think`, `code_interpreter`, and various LLM-based tools.
## Usage
```python
from datasets import load_dataset
ds = load_dataset("mai-ll/ipw-sft-trajectories", split="train", data_files="all_correct_trajectories_deduped.jsonl")
```
## Generation
Trajectories were generated using the [ipw_internal](https://github.com/HazyResearch/ipw_internal) pipeline:
1. Teacher models (Kimi-K2, Claude Sonnet, Claude Opus) solve tasks from GeneralThought using tool-use
2. Results are verified against ground truth
3. Failed trajectories are regraded with Claude Haiku for accuracy
4. Correct trajectories are deduplicated and combined
---
语言:
- 英语
许可证:Apache-2.0
任务类别:
- 文本生成
- 问答
标签:
- 监督微调(Supervised Fine-Tuning,SFT)
- 工具使用
- 调度器
- 轨迹
样本量区间:
- 10000 < 样本数 < 100000
---
# IPW SFT 轨迹
用于训练工具使用调度器模型的监督微调轨迹。每条轨迹包含一段多轮对话,其中模型通过选择并调用工具(计算器、代码解释器、思考模块)来完成指定任务。
## 数据集详情
| 统计项 | 数值 |
|------|-------|
| 去重且仅保留正确样本的轨迹总数 | 44,301 |
| 包含全部Opus样本的轨迹总数 | 44,866 |
| 任务类别总数 | 2,122 |
| 单条轨迹平均对话轮次 | 1.2 |
| 源数据集 | GeneralThought |
### 教师模型分布
| 教师模型 | 样本数 |
|---------|-------|
| Kimi-K2(1万亿参数混合专家模型) | 43,556 |
| 经过重新标注的Claude Sonnet | 735 |
| Claude Opus | 10 |
### 热门任务类别
| 任务类别 | 样本数 |
|----------|-------|
| 高中数学 | 13,245 |
| 数学奥林匹克 | 4,041 |
| 医学考试 | 3,915 |
| 开放式对话 | 2,962 |
| 文本分类 | 1,551 |
| 解释性任务 | 1,447 |
| 封闭式问答 | 1,229 |
| 美国数学邀请赛(AIME)数学题 | 1,174 |
| 通用数学 | 884 |
| 通用问答 | 740 |
## 数据集文件
- **`all_correct_trajectories_deduped.jsonl`**:包含44,301条轨迹,全部为任务成功样本(`success=true`),且基于`sample_id`在所有数据源中完成去重。
- **`all_correct_trajectories.jsonl`**:包含44,866条轨迹,除上述去重后的正确样本外,额外添加了全部575条Claude Opus生成的轨迹(无论其任务成功与否),以引入更强模型带来的样本多样性。
## 数据结构
每条数据行均为一个JSON对象,结构如下:
json
{
"sample_id": "traj_e719b342",
"conversations": [
{"role": "system", "content": "..."},
{"role": "user", "content": "How much power is dissipated..."},
{"role": "assistant", "content": "THOUGHT: ... TOOL: calculator INPUT: 5^2 * 24"},
{"role": "tool", "name": "calculator", "content": "600"},
{"role": "assistant", "content": "THOUGHT: ... FINAL_ANSWER: 600 watts"}
],
"tool_calls": [...],
"ground_truth": "600 W",
"final_answer": "600 watts of power are dissipated in the resistor.",
"success": true,
"total_energy_joules": 0.0,
"total_latency_seconds": 33.88,
"total_cost_usd": 0.0,
"total_tokens": 1675,
"num_turns": 2,
"source_dataset": "generalthought",
"category": "AC Circuits",
"teacher_model": "vllm:kimi-k2@8001",
"tools_used": ["calculator"]
}
## 可用工具
调度器可选择调用的工具包括:`calculator`(计算器)、`think`(思考模块)、`code_interpreter`(代码解释器)以及各类基于大语言模型的工具。
## 使用方法
可通过以下代码加载该数据集:
python
from datasets import load_dataset
ds = load_dataset("mai-ll/ipw-sft-trajectories", split="train", data_files="all_correct_trajectories_deduped.jsonl")
## 数据生成流程
轨迹基于[ipw_internal](https://github.com/HazyResearch/ipw_internal)流水线生成,具体步骤如下:
1. 利用教师模型(Kimi-K2、Claude Sonnet、Claude Opus)通过工具调用完成GeneralThought数据集中的任务
2. 将生成结果与标准答案进行校验
3. 对任务失败的轨迹使用Claude Haiku进行重新标注以修正准确性
4. 将通过校验的轨迹进行去重并合并
提供机构:
mai-ll



