mai-ll/ipw-sft-trajectories

Name: mai-ll/ipw-sft-trajectories
Creator: mai-ll
Published: 2026-03-30 21:41:26
License: 暂无描述

Hugging Face2026-03-30 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/mai-ll/ipw-sft-trajectories

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: apache-2.0 task_categories: - text-generation - question-answering tags: - sft - tool-use - orchestrator - trajectories size_categories: - 10K<n<100K --- # IPW SFT Trajectories Supervised fine-tuning trajectories for training tool-use orchestrator models. Each trajectory contains a multi-turn conversation where a model solves a task by selecting and using tools (calculator, code interpreter, think). ## Dataset Details | Stat | Value | |------|-------| | Total trajectories (deduped, correct only) | 44,301 | | Total trajectories (with all opus) | 44,866 | | Categories | 2,122 | | Avg turns per trajectory | 1.2 | | Source dataset | GeneralThought | ### Teacher Model Breakdown | Teacher | Count | |---------|-------| | Kimi-K2 (1T MoE) | 43,556 | | Claude Sonnet (regraded) | 735 | | Claude Opus | 10 | ### Top Categories | Category | Count | |----------|-------| | High School Math | 13,245 | | Math Olympiads | 4,041 | | Medical Exams | 3,915 | | Open Conversations | 2,962 | | Text Classification | 1,551 | | Explanation | 1,447 | | Closed Question Answering | 1,229 | | AIME Math | 1,174 | | General Math | 884 | | General Question Answering | 740 | ## Files - **`all_correct_trajectories_deduped.jsonl`** -- 44,301 trajectories. All correct (success=true), deduplicated by sample_id across all sources. - **`all_correct_trajectories.jsonl`** -- 44,866 trajectories. Same as above, plus all 575 Opus trajectories regardless of success (for diversity from a stronger model). ## Schema Each line is a JSON object with: ```json { "sample_id": "traj_e719b342", "conversations": [ {"role": "system", "content": "..."}, {"role": "user", "content": "How much power is dissipated..."}, {"role": "assistant", "content": "THOUGHT: ... TOOL: calculator INPUT: 5^2 * 24"}, {"role": "tool", "name": "calculator", "content": "600"}, {"role": "assistant", "content": "THOUGHT: ... FINAL_ANSWER: 600 watts"} ], "tool_calls": [...], "ground_truth": "600 W", "final_answer": "600 watts of power are dissipated in the resistor.", "success": true, "total_energy_joules": 0.0, "total_latency_seconds": 33.88, "total_cost_usd": 0.0, "total_tokens": 1675, "num_turns": 2, "source_dataset": "generalthought", "category": "AC Circuits", "teacher_model": "vllm:kimi-k2@8001", "tools_used": ["calculator"] } ``` ## Available Tools The orchestrator can select from: `calculator`, `think`, `code_interpreter`, and various LLM-based tools. ## Usage ```python from datasets import load_dataset ds = load_dataset("mai-ll/ipw-sft-trajectories", split="train", data_files="all_correct_trajectories_deduped.jsonl") ``` ## Generation Trajectories were generated using the [ipw_internal](https://github.com/HazyResearch/ipw_internal) pipeline: 1. Teacher models (Kimi-K2, Claude Sonnet, Claude Opus) solve tasks from GeneralThought using tool-use 2. Results are verified against ground truth 3. Failed trajectories are regraded with Claude Haiku for accuracy 4. Correct trajectories are deduplicated and combined

--- 语言： - 英语许可证：Apache-2.0 任务类别： - 文本生成 - 问答标签： - 监督微调（Supervised Fine-Tuning，SFT） - 工具使用 - 调度器 - 轨迹样本量区间： - 10000 < 样本数 < 100000 --- # IPW SFT 轨迹用于训练工具使用调度器模型的监督微调轨迹。每条轨迹包含一段多轮对话，其中模型通过选择并调用工具（计算器、代码解释器、思考模块）来完成指定任务。 ## 数据集详情 | 统计项 | 数值 | |------|-------| | 去重且仅保留正确样本的轨迹总数 | 44,301 | | 包含全部Opus样本的轨迹总数 | 44,866 | | 任务类别总数 | 2,122 | | 单条轨迹平均对话轮次 | 1.2 | | 源数据集 | GeneralThought | ### 教师模型分布 | 教师模型 | 样本数 | |---------|-------| | Kimi-K2（1万亿参数混合专家模型） | 43,556 | | 经过重新标注的Claude Sonnet | 735 | | Claude Opus | 10 | ### 热门任务类别 | 任务类别 | 样本数 | |----------|-------| | 高中数学 | 13,245 | | 数学奥林匹克 | 4,041 | | 医学考试 | 3,915 | | 开放式对话 | 2,962 | | 文本分类 | 1,551 | | 解释性任务 | 1,447 | | 封闭式问答 | 1,229 | | 美国数学邀请赛（AIME）数学题 | 1,174 | | 通用数学 | 884 | | 通用问答 | 740 | ## 数据集文件 - **`all_correct_trajectories_deduped.jsonl`**：包含44,301条轨迹，全部为任务成功样本（`success=true`），且基于`sample_id`在所有数据源中完成去重。 - **`all_correct_trajectories.jsonl`**：包含44,866条轨迹，除上述去重后的正确样本外，额外添加了全部575条Claude Opus生成的轨迹（无论其任务成功与否），以引入更强模型带来的样本多样性。 ## 数据结构每条数据行均为一个JSON对象，结构如下： json { "sample_id": "traj_e719b342", "conversations": [ {"role": "system", "content": "..."}, {"role": "user", "content": "How much power is dissipated..."}, {"role": "assistant", "content": "THOUGHT: ... TOOL: calculator INPUT: 5^2 * 24"}, {"role": "tool", "name": "calculator", "content": "600"}, {"role": "assistant", "content": "THOUGHT: ... FINAL_ANSWER: 600 watts"} ], "tool_calls": [...], "ground_truth": "600 W", "final_answer": "600 watts of power are dissipated in the resistor.", "success": true, "total_energy_joules": 0.0, "total_latency_seconds": 33.88, "total_cost_usd": 0.0, "total_tokens": 1675, "num_turns": 2, "source_dataset": "generalthought", "category": "AC Circuits", "teacher_model": "vllm:kimi-k2@8001", "tools_used": ["calculator"] } ## 可用工具调度器可选择调用的工具包括：`calculator`（计算器）、`think`（思考模块）、`code_interpreter`（代码解释器）以及各类基于大语言模型的工具。 ## 使用方法可通过以下代码加载该数据集： python from datasets import load_dataset ds = load_dataset("mai-ll/ipw-sft-trajectories", split="train", data_files="all_correct_trajectories_deduped.jsonl") ## 数据生成流程轨迹基于[ipw_internal](https://github.com/HazyResearch/ipw_internal)流水线生成，具体步骤如下： 1. 利用教师模型（Kimi-K2、Claude Sonnet、Claude Opus）通过工具调用完成GeneralThought数据集中的任务 2. 将生成结果与标准答案进行校验 3. 对任务失败的轨迹使用Claude Haiku进行重新标注以修正准确性 4. 将通过校验的轨迹进行去重并合并

提供机构：

mai-ll

5,000+

优质数据集

54 个

任务类型

进入经典数据集