OpenThoughts-Agent-RLv1
收藏魔搭社区2025-12-12 更新2025-12-13 收录
下载链接:
https://modelscope.cn/datasets/open-thoughts/OpenThoughts-Agent-RLv1
下载链接
链接失效反馈官方服务:
资源简介:
<p align="center">
<img src="https://huggingface.co/datasets/open-thoughts/OpenThoughts1-Agent-SFT/resolve/main/ota-logo.png" width="50%">
</p>
<p align="center">
<a href="https://open-thoughts.ai/agent" style="margin-right: 24px;">project</a> |
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT dataset</a> |
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL" style="margin-right: 24px; margin-left: 24px;">RL dataset</a> |
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT model</a> |
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1" style="margin-left: 24px;">RL model</a>
</p>
# OpenThoughts-Agent-v1-RL
## Dataset Description
- **Homepage:** https://www.open-thoughts.ai/agent
- **Repository:** https://github.com/open-thoughts/OpenThoughts-Agent
Open-source state-of-the-art agent training RL dataset with ~720 tasks.
For viewing the exact task:
- Visit our [trace viewer](https://ot-agent-trace-viewer.replit.app/tasks/penfever%2Fnl2bash-verified-tasks-cleaned)
- download the dataset as a repository and run `python extract_parquet_tasks.py tasks_new.parquet ./extracted_tasks` to convert to viewable task folders.
**OpenThoughts-Agent** is an open-source effort to curate the best datasets for training agents. Our first release includes [datasets](https://huggingface.co/collections/open-thoughts/openthinker-agent), [models](https://huggingface.co/collections/open-thoughts/openthinker-agent) and our [research codebase](https://github.com/open-thoughts/OpenThoughts-Agent);
[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) is a model trained for agentic tasks such as **Terminal-Bench 2.0** and **SWE-Bench**.
We built [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) in two stages: **supervised fine-tuning**, followed by **reinforcement learning**. Each stage required its own data pipeline – RL tasks (instructions, environments, and verifiers) and SFT traces from strong teacher agents completing tasks.
We are excited to release **OpenThoughts-Agent-v1-SFT** and **OpenThoughts-Agent-v1-RL**, our first official OpenThoughts-Agent datasets!
[OpenThoughts-Agent-v1-SFT](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) is an SFT trace dataset containing approximately **15,200 traces** drawn from two different data sources we curate:
- **nl2bash**: Simple synthetically generated tasks where the agent has to format shell commands effectively
- **InferredBugs**: A set of bugs in C# and Java collected by Microsoft that we turned into tasks
[OpenThoughts-Agent-v1-RL](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) is an RL dataset containing ~720 tasks drawn from the **nl2bash verified** dataset.
You can find the dataset in both parquet and viweable formats.
To stabilize training, we built a three-stage filtration pipeline that prunes tasks before they ever hit the learner:
1. Bad verifiers filter: drop tasks with flaky or excessively slow verifiers.
2. Environment stability: remove tasks whose containers take too long to build or tear down.
Optional difficulty filter: discard tasks that even a strong model (GPT-5 Codex) cannot solve in a single pass.
We define a **task** as a triplet of an instruction in the form of a markdown file, an environment defined by a DockerFile, and a verifier in the form of pytests. (The verifier is optional in the SFT setting). All of our environments in this release are generic Ubuntu DockerFiles.
# OpenThinker-Agent-v1 Model Performance
Our [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) model is the state-of-the-art model at its scale on agent benchmarks.
| Model | Harness | Terminal-Bench 2.0 | SWE-Bench Verified | OpenThoughts-TB-Dev |
| ----------------------------------------------------------------------------------------------- | ------- | ------------------ | --------- | ------------------- |
| [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | Terminus-2 | 0.0 | 0.7 | 5.7 |
| **[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)** | Terminus-2 | 4.9 | 15.7 | 17.3 |
| [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | Terminus-2 | 1.9 | 5.7 | 10.2 |
| [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) | OpenHands | 10.1 | 49.2 | 24.5 |
# Data Curation and Scaling Recipe
We ablated **15 different approaches**, selecting from both existing sources such as Nemo, SWESmith and Mind2Web, and those we created, such as StackExchange Overflow, Freelancer and Taskmaster.
For each source, we:
1. 🎯 **Generate approximately 10,000 tasks** from the data source
2. 🤖 **Let GPT-5-Nano solve each task once**, resulting in a 10,000 trace SFT dataset
3. 📊 **Evaluate each model on our OpenThoughts-TB-Dev dev set**
## Teacher Model Selection
We ablated was the **choice of teacher**. One would expect that the better the teacher model is on TerminalBench, the better it performs; surprisingly, we find that this is not the case.
Rather, varying teachers in the GPT model family did not improve performance, up to and including the best model on TerminalBench itself, GPT5. However, using **GLM-4.6 as a teacher** led to almost a **2x improvement in downstream score**.
# Links
- 🌐 [OpenThoughts-Agent Project Page](https://open-thoughts.ai/agent)
- 💻 [OpenThoughts-Agent GitHub Repository](https://github.com/open-thoughts/OpenThoughts-Agent)
- 🧠 [OpenThoughts-Agent-v1-SFT dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT)
- 🧠 [OpenThoughts-Agent-v1-RL dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL)
- 🧠 [OpenThoughts-TB-dev dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-TB-dev)
- 🤖 [OpenThinker-Agent-v1 model](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)
- 🤖 [OpenThinker-Agent-v1-SFT model](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT)
# Citation
```
@misc{openthoughts-agent,
author = {Team, OpenThoughts-Agent},
month = Dec,
title = {{OpenThoughts-Agent}},
howpublished = {https://open-thoughts.ai/agent},
year = {2025}
}
```
<p align="center">
<img src="https://huggingface.co/datasets/open-thoughts/OpenThoughts1-Agent-SFT/resolve/main/ota-logo.png" width="50%">
</p>
<p align="center">
<a href="https://open-thoughts.ai/agent" style="margin-right: 24px;">项目</a> |
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT数据集</a> |
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL" style="margin-right: 24px; margin-left: 24px;">RL数据集</a> |
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT模型</a> |
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1" style="margin-left: 24px;">RL模型</a>
</p>
# OpenThoughts-Agent-v1-RL
## 数据集描述
- **项目主页:** https://www.open-thoughts.ai/agent
- **代码仓库:** https://github.com/open-thoughts/OpenThoughts-Agent
本数据集为开源的前沿智能体训练强化学习(Reinforcement Learning, RL)数据集,包含约720个任务。
若要查看具体任务:
- 访问我们的[轨迹查看器](https://ot-agent-trace-viewer.replit.app/tasks/penfever%2Fnl2bash-verified-tasks-cleaned)
- 将数据集以仓库形式下载,执行命令`python extract_parquet_tasks.py tasks_new.parquet ./extracted_tasks`,即可转换为可查看的任务文件夹。
**OpenThoughts-Agent** 是一项开源项目,旨在为智能体(AI Agent)训练精选最优数据集。本次首发版本包含[数据集](https://huggingface.co/collections/open-thoughts/openthinker-agent)、[模型](https://huggingface.co/collections/open-thoughts/openthinker-agent)以及我们的[研究代码仓库](https://github.com/open-thoughts/OpenThoughts-Agent);[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) 是针对智能体任务训练的模型,适配场景包括**Terminal-Bench 2.0**与**SWE-Bench**。
我们分两个阶段构建了[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1):**监督微调(Supervised Fine-Tuning, SFT)**,随后是**强化学习(Reinforcement Learning, RL)**。每个阶段都需要专属的数据流水线,包括RL任务(指令、环境与验证器)以及由优秀教师智能体完成任务所生成的SFT轨迹。
我们很高兴正式发布**OpenThoughts-Agent-v1-SFT**与**OpenThoughts-Agent-v1-RL**,这也是OpenThoughts-Agent项目的首批官方数据集!
[OpenThoughts-Agent-v1-SFT](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) 是SFT轨迹数据集,包含约**15200条轨迹**,数据来源于我们精选的两个不同数据源:
- **nl2bash**:简单的合成生成式任务,要求智能体高效编写Shell命令
- **InferredBugs**:由微软收集的C#与Java代码缺陷集合,我们将其转化为智能体任务
[OpenThoughts-Agent-v1-RL](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) 是强化学习(RL)数据集,包含约720个任务,数据均源自**nl2bash verified**数据集。
本数据集同时提供Parquet格式与可查看格式。
为了稳定训练流程,我们构建了三阶段过滤流水线,在任务送入训练模型前完成任务筛选:
1. 不良验证器过滤:移除验证器不稳定或运行过慢的任务
2. 环境稳定性筛选:移除容器构建或销毁耗时过长的任务
可选难度过滤:丢弃即便是优秀模型(GPT-5 Codex)也无法单次完成的任务
我们将**任务**定义为三元组:Markdown格式的指令文件、由Dockerfile定义的运行环境,以及以PyTest形式实现的验证器。(在SFT场景中,验证器为可选配置。)本次发布的所有运行环境均为通用Ubuntu Docker镜像。
# OpenThinker-Agent-v1 模型性能
我们的[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) 模型在同尺寸智能体基准测试中处于前沿水平。
| 模型 | 测试框架 | Terminal-Bench 2.0 | SWE-Bench 验证集 | OpenThoughts-TB-Dev |
| ----------------------------------------------------------------------------------------------- | ------- | ------------------ | --------- | ------------------- |
| [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | Terminus-2 | 0.0 | 0.7 | 5.7 |
| **[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)** | Terminus-2 | 4.9 | 15.7 | 17.3 |
| [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | Terminus-2 | 1.9 | 5.7 | 10.2 |
| [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) | OpenHands | 10.1 | 49.2 | 24.5 |
# 数据精选与扩展方案
我们开展了**15种不同方法的消融实验**,数据源既包括Nemo、SWESmith与Mind2Web等现有资源,也涵盖StackExchange Overflow、Freelancer与Taskmaster等我们自建的资源。
针对每个数据源,我们执行以下步骤:
1. 🎯 **从数据源生成约10000个任务**
2. 🤖 **让GPT-5-Nano单次完成每个任务**,由此生成包含10000条轨迹的SFT数据集
3. 📊 **在我们的OpenThoughts-TB-Dev开发集上评估各模型性能**
## 教师模型选择
我们针对**教师模型的选择**开展了消融实验。通常人们会认为,教师模型在TerminalBench上的表现越好,最终模型的性能就越强;但令人意外的是,实验结果并非如此。
相反,在GPT模型家族中更换教师模型并未带来性能提升,即便使用TerminalBench上表现最优的GPT-5作为教师模型也无济于事。但使用**GLM-4.6作为教师模型**时,下游任务得分几乎提升了**一倍**。
# 相关链接
- 🌐 [OpenThoughts-Agent 项目主页](https://open-thoughts.ai/agent)
- 💻 [OpenThoughts-Agent GitHub 代码仓库](https://github.com/open-thoughts/OpenThoughts-Agent)
- 🧠 [OpenThoughts-Agent-v1-SFT 数据集](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT)
- 🧠 [OpenThoughts-Agent-v1-RL 数据集](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL)
- 🧠 [OpenThoughts-TB-dev 数据集](https://huggingface.co/datasets/open-thoughts/OpenThoughts-TB-dev)
- 🤖 [OpenThinker-Agent-v1 模型](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)
- 🤖 [OpenThinker-Agent-v1-SFT 模型](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT)
# 引用格式
@misc{openthoughts-agent,
author = {Team, OpenThoughts-Agent},
month = Dec,
title = {{OpenThoughts-Agent}},
howpublished = {https://open-thoughts.ai/agent},
year = {2025}
}
提供机构:
maas
创建时间:
2025-12-05



