OpenThoughts-Agent-v1-RL
收藏魔搭社区2026-01-06 更新2026-01-10 收录
下载链接:
https://modelscope.cn/datasets/open-thoughts/OpenThoughts-Agent-v1-RL
下载链接
链接失效反馈官方服务:
资源简介:
<p align="center">
<img src="https://huggingface.co/datasets/open-thoughts/OpenThoughts1-Agent-SFT/resolve/main/ota-logo.png" width="50%">
</p>
<p align="center">
<a href="https://www.openthoughts.ai/blog/agent" style="margin-right: 24px;">Project</a> |
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT dataset</a> |
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL" style="margin-right: 24px; margin-left: 24px;">RL dataset</a> |
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT model</a> |
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1" style="margin-left: 24px;">RL model</a>
</p>
# OpenThoughts-Agent-v1-RL
A curated RL dataset of ~720 tasks with instructions, environments, and verifiers for agentic training.
## Dataset Description
- **Homepage:** https://www.openthoughts.ai/blog/agent
- **Repository:** https://github.com/open-thoughts/OpenThoughts-Agent
**OpenThoughts-Agent** is an open-source effort to curate the best datasets for training agents. Our first release includes [datasets](https://huggingface.co/collections/open-thoughts/openthinker-agent), [models](https://huggingface.co/collections/open-thoughts/openthinker-agent) and our [research codebase](https://github.com/open-thoughts/OpenThoughts-Agent);
[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) is a model trained for agentic tasks such as **Terminal-Bench 2.0** and **SWE-Bench**.
We built [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) in two stages: **supervised fine-tuning**, followed by **reinforcement learning**. Each stage required its own data pipeline – RL tasks (instructions, environments, and verifiers) and SFT traces from strong teacher agents completing tasks.
We are excited to release **OpenThoughts-Agent-v1-SFT** and **OpenThoughts-Agent-v1-RL**, our first official OpenThoughts-Agent datasets!
[OpenThoughts-Agent-v1-SFT](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) is an SFT trace dataset containing approximately **15,200 traces** drawn from two different data sources we curate:
- **nl2bash**: Simple synthetically generated tasks where the agent has to format shell commands effectively
- **InferredBugs**: A set of bugs in C# and Java collected by Microsoft that we turned into tasks
[OpenThoughts-Agent-v1-RL](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) is an RL dataset containing ~720 tasks drawn from the **nl2bash verified** dataset.
To stabilize training, we built a three-stage filtration pipeline that prunes tasks before they ever hit the learner:
1. Bad verifiers filter: drop tasks with flaky or excessively slow verifiers.
2. Environment stability: remove tasks whose containers take too long to build or tear down.
Optional difficulty filter: discard tasks that even a strong model (GPT-5 Codex) cannot solve in a single pass.
We define a **task** as a triplet of an instruction in the form of a markdown file, an environment defined by a DockerFile, and a verifier in the form of pytests. (The verifier is optional in the SFT setting). All of our environments in this release are generic Ubuntu DockerFiles.
## Interacting with the Data
To explore the tasks locally, you can extract them into a readable format using the following commands (make sure `pyarrow` is installed):
```
curl -L -o extract_parquet_tasks.py \
"https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL/raw/main/extract_parquet_tasks.py"
curl -L -o tasks.parquet \
"https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL/resolve/main/tasks.parquet"
python extract_parquet_tasks.py tasks.parquet ./extracted_tasks
```
If you prefer a web interface, you can browse the dataset directly in our [interactive trace viewer](https://ot-agent-trace-viewer.replit.app/tasks/open-thoughts%2FOpenThoughts-Agent-v1-RL).
# OpenThinker-Agent-v1 Model Performance
Our [OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) model is the state-of-the-art model at its scale on agent benchmarks.
| Model | Harness | Terminal-Bench 2.0 | SWE-Bench Verified | OpenThoughts-TB-Dev |
| ----------------------------------------------------------------------------------------------- | ------- | ------------------ | --------- | ------------------- |
| [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | Terminus-2 | 0.0 | 0.7 | 5.7 |
| **[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)** | Terminus-2 | 4.9 | 15.7 | 17.3 |
| [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | Terminus-2 | 1.9 | 5.7 | 10.2 |
| [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) | OpenHands | 10.1 | 49.2 | 24.5 |
# Data Curation and Scaling Recipe
We ablated **15 different approaches**, selecting from both existing sources such as Nemo, SWESmith and Mind2Web, and those we created, such as StackExchange Overflow, Freelancer and Taskmaster.
For each source, we:
1. 🎯 **Generate approximately 10,000 tasks** from the data source
2. 🤖 **Let GPT-5-Nano solve each task once**, resulting in a 10,000 trace SFT dataset
3. 📊 **Evaluate each model on our OpenThoughts-TB-Dev dev set**
## Teacher Model Selection
We ablated was the **choice of teacher**. One would expect that the better the teacher model is on TerminalBench, the better it performs; surprisingly, we find that this is not the case.
Rather, varying teachers in the GPT model family did not improve performance, up to and including the best model on TerminalBench itself, GPT5. However, using **GLM-4.6 as a teacher** led to almost a **2x improvement in downstream score**.
# Links
- 🌐 [OpenThoughts-Agent Project Page](https://www.openthoughts.ai/blog/agent)
- 💻 [OpenThoughts-Agent GitHub Repository](https://github.com/open-thoughts/OpenThoughts-Agent)
- 🧠 [OpenThoughts-Agent-v1-SFT dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT)
- 🧠 [OpenThoughts-Agent-v1-RL dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL)
- 🧠 [OpenThoughts-TB-dev dataset](https://huggingface.co/datasets/open-thoughts/OpenThoughts-TB-dev)
- 🤖 [OpenThinker-Agent-v1 model](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)
- 🤖 [OpenThinker-Agent-v1-SFT model](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT)
# Citation
```
@misc{openthoughts-agent,
author = {Team, OpenThoughts-Agent},
month = Dec,
title = {{OpenThoughts-Agent}},
howpublished = {https://open-thoughts.ai/agent},
year = {2025}
}
```
<p align="center">
<img src="https://huggingface.co/datasets/open-thoughts/OpenThoughts1-Agent-SFT/resolve/main/ota-logo.png" width="50%">
</p>
<p align="center">
<a href="https://www.openthoughts.ai/blog/agent" style="margin-right: 24px;">项目主页</a> |
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT数据集</a> |
<a href="https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL" style="margin-right: 24px; margin-left: 24px;">RL数据集</a> |
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT" style="margin-right: 24px; margin-left: 24px;">SFT模型</a> |
<a href="https://huggingface.co/open-thoughts/OpenThinker-Agent-v1" style="margin-left: 24px;">RL模型</a>
</p>
# OpenThoughts-Agent-v1-RL
本数据集为经过精心整理的强化学习(Reinforcement Learning,RL)数据集,包含约720个任务,配有用于智能体训练的指令、运行环境与验证器。
## 数据集说明
- **项目主页**:https://www.openthoughts.ai/blog/agent
- **代码仓库**:https://github.com/open-thoughts/OpenThoughts-Agent
**OpenThoughts-Agent** 是一项开源项目,旨在整理用于训练智能体的优质数据集。本次首次发布的内容包含[数据集](https://huggingface.co/collections/open-thoughts/openthinker-agent)、[模型](https://huggingface.co/collections/open-thoughts/openthinker-agent)以及[研究代码库](https://github.com/open-thoughts/OpenThoughts-Agent);[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) 是针对**Terminal-Bench 2.0**与**SWE-Bench**等智能体任务训练的模型。
我们分两个阶段构建[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1):首先是**监督微调(Supervised Fine-Tuning,SFT)**,随后是**强化学习(RL)**。每个阶段都需要专属的数据流水线——RL任务(包含指令、环境与验证器)以及由优秀教师智能体完成任务时生成的SFT轨迹数据。
我们很高兴正式发布首批OpenThoughts-Agent系列数据集:**OpenThoughts-Agent-v1-SFT**与**OpenThoughts-Agent-v1-RL**!
[OpenThoughts-Agent-v1-SFT](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT) 是SFT轨迹数据集,包含约15200条轨迹数据,源自我们整理的两类不同数据源:
- **nl2bash**:通过合成生成的简单任务,要求智能体高效编写Shell命令
- **InferredBugs**:由微软收集的C#与Java代码缺陷集合,我们将其转化为智能体任务
[OpenThoughts-Agent-v1-RL](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL) 则是RL数据集,包含约720个源自**nl2bash验证版**数据集的任务。
为稳定训练流程,我们构建了三阶段过滤流水线,在任务进入模型训练前完成筛选:
1. 不良验证器过滤:移除验证器存在不稳定或运行过慢的任务
2. 环境稳定性过滤:移除容器构建或销毁耗时过长的任务
3. 可选难度过滤:剔除连优秀模型(GPT-5 Codex)都无法单次完成的任务
我们将**任务**定义为三元组:以Markdown文件形式呈现的指令、由Dockerfile定义的运行环境,以及以pytest形式实现的验证器(在SFT场景中,验证器为可选配置)。本次发布的所有环境均为通用Ubuntu镜像的Dockerfile。
## 数据交互方式
若需在本地浏览任务,可通过以下命令将数据提取为可读格式(请确保已安装`pyarrow`库):
curl -L -o extract_parquet_tasks.py
"https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL/raw/main/extract_parquet_tasks.py"
curl -L -o tasks.parquet
"https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL/resolve/main/tasks.parquet"
python extract_parquet_tasks.py tasks.parquet ./extracted_tasks
若偏好使用网页界面,可直接通过我们的[交互式轨迹查看器](https://ot-agent-trace-viewer.replit.app/tasks/open-thoughts%2FOpenThoughts-Agent-v1-RL)浏览该数据集。
# OpenThinker-Agent-v1 模型性能
[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1) 模型在同规模智能体基准测试中处于当前最优水平。
| 模型 | 测试套件 | Terminal-Bench 2.0 | SWE-Bench 验证集 | OpenThoughts-TB-Dev |
| ----------------------------------------------------------------------------------------------- | ------- | ------------------ | --------- | ------------------- |
| [Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | Terminus-2 | 0.0 | 0.7 | 5.7 |
| **[OpenThinker-Agent-v1](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)** | Terminus-2 | 4.9 | 15.7 | 17.3 |
| [Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | Terminus-2 | 1.9 | 5.7 | 10.2 |
| [Qwen/Qwen3-Coder-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct) | OpenHands | 10.1 | 49.2 | 24.5 |
# 数据整理与缩放方案
我们对**15种不同的构建方案**进行了消融实验,数据源涵盖既有资源(如Nemo、SWESmith与Mind2Web)以及我们自主创建的资源(如StackOverflow、Freelancer与Taskmaster)。
针对每个数据源,我们依次执行以下步骤:
1. 🎯 **生成约10000个任务**:从对应数据源中生成任务
2. 🤖 **单次任务求解**:使用GPT-5-Nano完成每个任务的单次求解,得到包含10000条轨迹的SFT数据集
3. 📊 **模型评估**:在我们的OpenThoughts-TB-Dev开发集上对模型进行性能评估
## 教师模型选择
我们针对**教师模型的选型**开展了消融实验。通常人们会认为,教师模型在TerminalBench上的表现越好,最终训练得到的智能体性能也越强;但令人意外的是,实验结果并非如此。
具体而言,在GPT模型家族中更换教师模型并未带来性能提升,即便使用TerminalBench上表现最优的GPT-5模型也无济于事。但当我们**选用GLM-4.6作为教师模型**时,下游任务的得分几乎提升了**一倍**。
# 相关链接
- 🌐 [OpenThoughts-Agent 项目主页](https://www.openthoughts.ai/blog/agent)
- 💻 [OpenThoughts-Agent GitHub代码仓库](https://github.com/open-thoughts/OpenThoughts-Agent)
- 🧠 [OpenThoughts-Agent-v1-SFT 数据集](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-SFT)
- 🧠 [OpenThoughts-Agent-v1-RL 数据集](https://huggingface.co/datasets/open-thoughts/OpenThoughts-Agent-v1-RL)
- 🧠 [OpenThoughts-TB-dev 数据集](https://huggingface.co/datasets/open-thoughts/OpenThoughts-TB-dev)
- 🤖 [OpenThinker-Agent-v1 模型](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1)
- 🤖 [OpenThinker-Agent-v1-SFT 模型](https://huggingface.co/open-thoughts/OpenThinker-Agent-v1-SFT)
# 引用
@misc{openthoughts-agent,
author = {Team, OpenThoughts-Agent},
month = Dec,
title = {{OpenThoughts-Agent}},
howpublished = {https://open-thoughts.ai/agent},
year = {2025}
}
提供机构:
maas
创建时间:
2025-12-06



