Sellopale/OpenThoughts-Agent-v1-SFT

Name: Sellopale/OpenThoughts-Agent-v1-SFT
Creator: Sellopale
Published: 2025-12-15 13:25:33
License: 暂无描述

Hugging Face2025-12-15 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/Sellopale/OpenThoughts-Agent-v1-SFT

下载链接

链接失效反馈

官方服务：

资源简介：

OpenThoughts-Agent-v1-SFT是一个监督微调（SFT）跟踪数据集，包含大约15,200条跟踪数据，这些数据来自两个不同的数据源：nl2bash（简单合成生成的任务，要求代理有效格式化shell命令）和InferredBugs（微软收集的一组C#和Java错误，我们将其转化为任务）。跟踪数据使用QuantTrio/GLM-4.6-AWQ和Terminus-2代理工具生成，最多32轮。使用vLLM的默认采样参数和最大上下文长度64K。OpenThoughts-Agent-v1-RL是一个强化学习（RL）数据集，包含约720个任务，来自nl2bash验证数据集。为了稳定训练，我们建立了一个三阶段过滤管道，在任务进入学习器之前进行修剪：1. 不良验证器过滤器：丢弃具有不稳定或过慢验证器的任务；2. 环境稳定性：移除容器构建或拆除时间过长的任务；3. 可选难度过滤器：丢弃即使是强模型（GPT-5 Codex）也无法一次性解决的任务。

OpenThoughts-Agent-v1-SFT is an SFT trace dataset containing approximately 15,200 traces drawn from two different data sources we curate: nl2bash (simple synthetically generated tasks where the agent has to format shell commands effectively) and InferredBugs (a set of bugs in C# and Java collected by Microsoft that we turned into tasks). The traces are generated using QuantTrio/GLM-4.6-AWQ and Terminus-2 agentic harness with a maximum of 32 turns. We use the default sampling parameters from vLLM and a maximum context length of 64K. OpenThoughts-Agent-v1-RL is an RL dataset containing ~720 tasks drawn from the nl2bash verified dataset. To stabilize training, we built a three-stage filtration pipeline that prunes tasks before they ever hit the learner: 1. Bad verifiers filter: drop tasks with flaky or excessively slow verifiers; 2. Environment stability: remove tasks whose containers take too long to build or tear down; 3. Optional difficulty filter: discard tasks that even a strong model (GPT-5 Codex) cannot solve in a single pass.

提供机构：

Sellopale

5,000+

优质数据集

54 个

任务类型

进入经典数据集