five

mai-ll/ipw-sft-trajectories

收藏
Hugging Face2026-03-30 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/mai-ll/ipw-sft-trajectories
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 task_categories: - text-generation - question-answering tags: - sft - tool-use - orchestrator - trajectories size_categories: - 10K<n<100K --- # IPW SFT Trajectories Supervised fine-tuning trajectories for training tool-use orchestrator models. Each trajectory contains a multi-turn conversation where a model solves a task by selecting and using tools (calculator, code interpreter, think). ## Dataset Details | Stat | Value | |------|-------| | Total trajectories (deduped, correct only) | 44,301 | | Total trajectories (with all opus) | 44,866 | | Categories | 2,122 | | Avg turns per trajectory | 1.2 | | Source dataset | GeneralThought | ### Teacher Model Breakdown | Teacher | Count | |---------|-------| | Kimi-K2 (1T MoE) | 43,556 | | Claude Sonnet (regraded) | 735 | | Claude Opus | 10 | ### Top Categories | Category | Count | |----------|-------| | High School Math | 13,245 | | Math Olympiads | 4,041 | | Medical Exams | 3,915 | | Open Conversations | 2,962 | | Text Classification | 1,551 | | Explanation | 1,447 | | Closed Question Answering | 1,229 | | AIME Math | 1,174 | | General Math | 884 | | General Question Answering | 740 | ## Files - **`all_correct_trajectories_deduped.jsonl`** -- 44,301 trajectories. All correct (success=true), deduplicated by sample_id across all sources. - **`all_correct_trajectories.jsonl`** -- 44,866 trajectories. Same as above, plus all 575 Opus trajectories regardless of success (for diversity from a stronger model). ## Schema Each line is a JSON object with: ```json { "sample_id": "traj_e719b342", "conversations": [ {"role": "system", "content": "..."}, {"role": "user", "content": "How much power is dissipated..."}, {"role": "assistant", "content": "THOUGHT: ... TOOL: calculator INPUT: 5^2 * 24"}, {"role": "tool", "name": "calculator", "content": "600"}, {"role": "assistant", "content": "THOUGHT: ... FINAL_ANSWER: 600 watts"} ], "tool_calls": [...], "ground_truth": "600 W", "final_answer": "600 watts of power are dissipated in the resistor.", "success": true, "total_energy_joules": 0.0, "total_latency_seconds": 33.88, "total_cost_usd": 0.0, "total_tokens": 1675, "num_turns": 2, "source_dataset": "generalthought", "category": "AC Circuits", "teacher_model": "vllm:kimi-k2@8001", "tools_used": ["calculator"] } ``` ## Available Tools The orchestrator can select from: `calculator`, `think`, `code_interpreter`, and various LLM-based tools. ## Usage ```python from datasets import load_dataset ds = load_dataset("mai-ll/ipw-sft-trajectories", split="train", data_files="all_correct_trajectories_deduped.jsonl") ``` ## Generation Trajectories were generated using the [ipw_internal](https://github.com/HazyResearch/ipw_internal) pipeline: 1. Teacher models (Kimi-K2, Claude Sonnet, Claude Opus) solve tasks from GeneralThought using tool-use 2. Results are verified against ground truth 3. Failed trajectories are regraded with Claude Haiku for accuracy 4. Correct trajectories are deduplicated and combined

--- 语言: - 英语 许可证:Apache-2.0 任务类别: - 文本生成 - 问答 标签: - 监督微调(Supervised Fine-Tuning,SFT) - 工具使用 - 调度器 - 轨迹 样本量区间: - 10000 < 样本数 < 100000 --- # IPW SFT 轨迹 用于训练工具使用调度器模型的监督微调轨迹。每条轨迹包含一段多轮对话,其中模型通过选择并调用工具(计算器、代码解释器、思考模块)来完成指定任务。 ## 数据集详情 | 统计项 | 数值 | |------|-------| | 去重且仅保留正确样本的轨迹总数 | 44,301 | | 包含全部Opus样本的轨迹总数 | 44,866 | | 任务类别总数 | 2,122 | | 单条轨迹平均对话轮次 | 1.2 | | 源数据集 | GeneralThought | ### 教师模型分布 | 教师模型 | 样本数 | |---------|-------| | Kimi-K2(1万亿参数混合专家模型) | 43,556 | | 经过重新标注的Claude Sonnet | 735 | | Claude Opus | 10 | ### 热门任务类别 | 任务类别 | 样本数 | |----------|-------| | 高中数学 | 13,245 | | 数学奥林匹克 | 4,041 | | 医学考试 | 3,915 | | 开放式对话 | 2,962 | | 文本分类 | 1,551 | | 解释性任务 | 1,447 | | 封闭式问答 | 1,229 | | 美国数学邀请赛(AIME)数学题 | 1,174 | | 通用数学 | 884 | | 通用问答 | 740 | ## 数据集文件 - **`all_correct_trajectories_deduped.jsonl`**:包含44,301条轨迹,全部为任务成功样本(`success=true`),且基于`sample_id`在所有数据源中完成去重。 - **`all_correct_trajectories.jsonl`**:包含44,866条轨迹,除上述去重后的正确样本外,额外添加了全部575条Claude Opus生成的轨迹(无论其任务成功与否),以引入更强模型带来的样本多样性。 ## 数据结构 每条数据行均为一个JSON对象,结构如下: json { "sample_id": "traj_e719b342", "conversations": [ {"role": "system", "content": "..."}, {"role": "user", "content": "How much power is dissipated..."}, {"role": "assistant", "content": "THOUGHT: ... TOOL: calculator INPUT: 5^2 * 24"}, {"role": "tool", "name": "calculator", "content": "600"}, {"role": "assistant", "content": "THOUGHT: ... FINAL_ANSWER: 600 watts"} ], "tool_calls": [...], "ground_truth": "600 W", "final_answer": "600 watts of power are dissipated in the resistor.", "success": true, "total_energy_joules": 0.0, "total_latency_seconds": 33.88, "total_cost_usd": 0.0, "total_tokens": 1675, "num_turns": 2, "source_dataset": "generalthought", "category": "AC Circuits", "teacher_model": "vllm:kimi-k2@8001", "tools_used": ["calculator"] } ## 可用工具 调度器可选择调用的工具包括:`calculator`(计算器)、`think`(思考模块)、`code_interpreter`(代码解释器)以及各类基于大语言模型的工具。 ## 使用方法 可通过以下代码加载该数据集: python from datasets import load_dataset ds = load_dataset("mai-ll/ipw-sft-trajectories", split="train", data_files="all_correct_trajectories_deduped.jsonl") ## 数据生成流程 轨迹基于[ipw_internal](https://github.com/HazyResearch/ipw_internal)流水线生成,具体步骤如下: 1. 利用教师模型(Kimi-K2、Claude Sonnet、Claude Opus)通过工具调用完成GeneralThought数据集中的任务 2. 将生成结果与标准答案进行校验 3. 对任务失败的轨迹使用Claude Haiku进行重新标注以修正准确性 4. 将通过校验的轨迹进行去重并合并
提供机构:
mai-ll
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作