Sellopale/OpenThoughts-Agent-v1-SFT
收藏Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Sellopale/OpenThoughts-Agent-v1-SFT
下载链接
链接失效反馈官方服务:
资源简介:
OpenThoughts-Agent-v1-SFT是一个监督微调(SFT)跟踪数据集,包含大约15,200条跟踪数据,这些数据来自两个不同的数据源:nl2bash(简单合成生成的任务,要求代理有效格式化shell命令)和InferredBugs(微软收集的一组C#和Java错误,我们将其转化为任务)。跟踪数据使用QuantTrio/GLM-4.6-AWQ和Terminus-2代理工具生成,最多32轮。使用vLLM的默认采样参数和最大上下文长度64K。OpenThoughts-Agent-v1-RL是一个强化学习(RL)数据集,包含约720个任务,来自nl2bash验证数据集。为了稳定训练,我们建立了一个三阶段过滤管道,在任务进入学习器之前进行修剪:1. 不良验证器过滤器:丢弃具有不稳定或过慢验证器的任务;2. 环境稳定性:移除容器构建或拆除时间过长的任务;3. 可选难度过滤器:丢弃即使是强模型(GPT-5 Codex)也无法一次性解决的任务。
OpenThoughts-Agent-v1-SFT is an SFT trace dataset containing approximately 15,200 traces drawn from two different data sources we curate: nl2bash (simple synthetically generated tasks where the agent has to format shell commands effectively) and InferredBugs (a set of bugs in C# and Java collected by Microsoft that we turned into tasks). The traces are generated using QuantTrio/GLM-4.6-AWQ and Terminus-2 agentic harness with a maximum of 32 turns. We use the default sampling parameters from vLLM and a maximum context length of 64K. OpenThoughts-Agent-v1-RL is an RL dataset containing ~720 tasks drawn from the nl2bash verified dataset. To stabilize training, we built a three-stage filtration pipeline that prunes tasks before they ever hit the learner: 1. Bad verifiers filter: drop tasks with flaky or excessively slow verifiers; 2. Environment stability: remove tasks whose containers take too long to build or tear down; 3. Optional difficulty filter: discard tasks that even a strong model (GPT-5 Codex) cannot solve in a single pass.
提供机构:
Sellopale



