OpenThoughts-Agent-v1-SFT
收藏魔搭社区2025-12-18 更新2025-12-13 收录
下载链接:
https://modelscope.cn/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT
下载链接
链接失效反馈官方服务:
资源简介:
<p align="center">
<img src="https://huggingface.co/datasets/open-thoughts/OpenThoughts1-Agent-SFT/resolve/main/ota-logo.png" width="50%">
</p>
<p align="center">
<a href="https://www.openthoughts.ai/blog/agent" style="margin-right: 24px;">Project</a> |
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT dataset</a> |
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL" style="margin-right: 24px; margin-left: 24px;">RL dataset</a> |
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT model</a> |
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1" style="margin-left: 24px;">RL model</a>
</p>
# OpenThinker-Agent-v1-SFT
**OpenThoughts-Agent** is an open-source effort to curate the best datasets for training agents. Our first release includes [datasets](https://huggingface.co/collections/open-thoughts/openthinker-agent), [models](https://huggingface.co/collections/open-thoughts/openthinker-agent) and our research codebase.
[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) is a model trained for agentic tasks such as **Terminal-Bench 2.0** and **SWE-Bench**.
The [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) model is post-trained from [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B).
It is SFT-ed on the [OpenThoughts-Agent-v1-SFT](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) dataset, then RL-ed on the [OpenThoughts-Agent-v1-RL](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) dataset.
This [OpenThinker-Agent-v1-SFT](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT) model is the model after the SFT stage. For the model after both SFT and RL stages, see [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1).
- **Homepage:** https://www.open-thoughts.ai/agent
- **Repository:** https://github.com/open-thoughts/OpenThoughts-Agent
# OpenThinker-Agent-v1 Model Performance
Our [OpenThinker-Agent-v1](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) model is the state-of-the-art model at its scale on agent benchmarks.
| Model | Harness | Terminal-Bench 2.0 | SWE-Bench Verified | OpenThoughts-TB-Dev |
| ----------------------------------------------------------------------------------------------- | ------- | ------------------ | --------- | ------------------- |
| [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | Terminus-2 | 0.0 | 0.7 | 5.7 |
| **[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)** | Terminus-2 | 4.9 | 15.7 | 17.3 |
| [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | Terminus-2 | 1.9 | 5.7 | 10.2 |
| [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) | OpenHands | 10.1 | 49.2 | 24.5 |
# Data
We built [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) in two stages: **supervised fine-tuning**, followed by **reinforcement learning**.
Each stage required its own data pipeline – RL tasks (instructions, environments, and verifiers) and SFT traces from strong teacher agents completing tasks.
[OpenThoughts-Agent-v1-SFT](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) is an SFT trace dataset containing approximately **15,200 traces** drawn from two different data sources we curate:
- **nl2bash**: Simple synthetically generated tasks where the agent has to format shell commands effectively
- **InferredBugs**: A set of bugs in C# and Java collected by Microsoft that we turned into tasks
We generate the traces using QuantTrio/GLM-4.6-AWQ and Terminus-2 agentic harness with a maximum of 32 turns. We use the default sampling parameters from vLLM and a maximum context length of 64K. Please see our example [data generation script](https://github.com/open-thoughts/OpenThoughts-Agent/blob/main/notebook/datagen_sft_tutorial.ipynb) for more details.
[OpenThoughts-Agent-v1-RL](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) is an RL dataset containing ~720 tasks drawn from the **nl2bash verified** dataset.
To stabilize training, we built a three-stage filtration pipeline that prunes tasks before they ever hit the learner:
1. Bad verifiers filter: drop tasks with flaky or excessively slow verifiers.
2. Environment stability: remove tasks whose containers take too long to build or tear down.
Optional difficulty filter: discard tasks that even a strong model (GPT-5 Codex) cannot solve in a single pass.
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 4e-05
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 16
- total_train_batch_size: 16
- total_eval_batch_size: 128
- optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.98) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 7.0
### Framework versions
- Transformers 4.56.0
- Pytorch 2.9.0+cu128
- Datasets 4.4.1
- Tokenizers 0.22.1
# Links
- 🌐 [OpenThoughts-Agent project page](https://open-thoughts.ai/blog/agent)
- 💻 [OpenThoughts-Agent GitHub repository](https://github.com/open-thoughts/OpenThoughts-Agent)
- 🧠 [OpenThoughts-Agent-v1-SFT dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT)
- 🧠 [OpenThoughts-Agent-v1-RL dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL)
- 🧠 [OpenThoughts-TB-dev dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-TB-dev)
- 🤖 [OpenThinker-Agent-v1 model](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)
- 🤖 [OpenThinker-Agent-v1-SFT model](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT) --> this model
# Citation
```
@misc{openthoughts-agent,
author = {Team, OpenThoughts-Agent},
month = Dec,
title = {{OpenThoughts-Agent}},
howpublished = {https://open-thoughts.ai/agent},
year = {2025}
}
```
<p align="center">
<img src="https://huggingface.co/datasets/open-thoughts/OpenThoughts1-Agent-SFT/resolve/main/ota-logo.png" width="50%">
</p>
<p align="center">
<a href="https://www.openthoughts.ai/blog/agent" style="margin-right: 24px;">项目</a> |
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">监督微调(Supervised Fine-Tuning,SFT)数据集</a> |
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL" style="margin-right: 24px; margin-left: 24px;">强化学习(Reinforcement Learning,RL)数据集</a> |
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">监督微调模型</a> |
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1" style="margin-left: 24px;">强化学习模型</a>
</p>
# OpenThinker-Agent-v1-SFT
**OpenThoughts-Agent** 是一项开源项目,旨在遴选用于训练智能体(AI Agent)的优质数据集。我们的首个发布版本包含数据集集合、模型集合以及研究代码库。
[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) 是针对智能体任务训练的模型,例如 **Terminal-Bench 2.0** 与 **SWE-Bench**。
[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) 模型基于 [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) 进行后续微调。该模型首先在 [OpenThoughts-Agent-v1-SFT](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) 数据集上完成监督微调(Supervised Fine-Tuning,SFT),随后在 [OpenThoughts-Agent-v1-RL](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) 数据集上完成强化学习(Reinforcement Learning,RL)训练。
本OpenThinker-Agent-v1-SFT模型即为完成监督微调阶段后的模型。如需查看同时完成监督微调与强化学习两个阶段的模型,请访问 [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)。
- **主页:** https://www.open-thoughts.ai/agent
- **代码仓库:** https://github.com/open-thoughts/OpenThoughts-Agent
# OpenThinker-Agent-v1 模型性能
本 [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) 模型在同参数量级的智能体基准测试中处于当前最优水平。
| 模型 | 基准测试框架 | Terminal-Bench 2.0 | SWE-Bench 验证集 | OpenThoughts-TB-Dev |
| ----------------------------------------------------------------------------------------------- | ------- | ------------------ | --------- | ------------------- |
| [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | Terminus-2 | 0.0 | 0.7 | 5.7 |
| **[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)** | Terminus-2 | 4.9 | 15.7 | 17.3 |
| [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | Terminus-2 | 1.9 | 5.7 | 10.2 |
| [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) | OpenHands | 10.1 | 49.2 | 24.5 |
# 数据集详情
我们分两个阶段构建OpenThinker-Agent-v1:首先进行监督微调,随后开展强化学习。每个阶段都需要专属的数据流水线:强化学习阶段包含任务指令、运行环境与验证器,监督微调阶段的数据则来自优秀教师智能体完成任务时产生的交互轨迹。
[OpenThoughts-Agent-v1-SFT](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) 是一个监督微调轨迹数据集,包含约15200条交互轨迹,数据来源于我们遴选的两个不同数据源:
- **nl2bash**:简单的合成生成任务,要求智能体高效编写Shell命令
- **InferredBugs**:由微软收集的C#与Java代码缺陷集合,我们将其转化为智能体任务
我们使用QuantTrio/GLM-4.6-AWQ与Terminus-2智能体基准测试框架生成轨迹,单条轨迹最大交互轮次为32轮。我们采用vLLM的默认采样参数,最大上下文长度为64K。如需了解更多细节,请参阅我们的示例[数据生成脚本](https://github.com/open-thoughts/OpenThoughts-Agent/blob/main/notebook/datagen_sft_tutorial.ipynb)。
[OpenThoughts-Agent-v1-RL](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) 是一个强化学习数据集,包含约720个任务,数据均来自**nl2bash验证集**。
为稳定训练过程,我们构建了三阶段过滤流水线,在任务进入模型训练前完成冗余任务剔除:
1. 无效验证器过滤:剔除验证逻辑不稳定或运行过慢的任务
2. 环境稳定性过滤:移除容器构建或销毁耗时过长的任务
3. 可选难度过滤:剔除即使是高性能模型(如GPT-5 Codex)也无法单次完成的任务
### 训练超参数
以下为训练过程中使用的超参数:
- 学习率:4e-05
- 训练批次大小:1
- 验证批次大小:8
- 随机种子:42
- 分布式训练类型:多GPU
- 可用设备数:16
- 总训练批次大小:16
- 总验证批次大小:128
- 优化器:采用OptimizerNames.ADAMW_TORCH_FUSED优化器,β参数为(0.9, 0.98),ε参数为1e-08,无额外优化器参数
- 学习率调度器类型:余弦衰减
- 学习率调度器预热比例:0.1
- 训练轮次:7.0
### 依赖框架版本
- Transformers:4.56.0
- PyTorch:2.9.0+cu128
- Datasets:4.4.1
- Tokenizers:0.22.1
# 相关链接
- 🌐 [OpenThoughts-Agent 项目主页](https://open-thoughts.ai/blog/agent)
- 💻 [OpenThoughts-Agent GitHub 代码仓库](https://github.com/open-thoughts/OpenThoughts-Agent)
- 🧠 [OpenThoughts-Agent-v1-SFT 数据集](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT)
- 🧠 [OpenThoughts-Agent-v1-RL 数据集](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL)
- 🧠 [OpenThoughts-TB-dev 数据集](https://huggingface.co/datasets/open-thoughts/OpenThoughts-TB-dev)
- 🤖 [OpenThinker-Agent-v1 模型](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)
- 🤖 [OpenThinker-Agent-v1-SFT 模型](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT) —— 即本模型
# 引用
@misc{openthoughts-agent,
author = {Team, OpenThoughts-Agent},
month = Dec,
title = {{OpenThoughts-Agent}},
howpublished = {https://open-thoughts.ai/agent},
year = {2025}
}
提供机构:
maas
创建时间:
2025-12-06



