five

disserji1/OpenThoughts-Agent-v1-RL

收藏
Hugging Face2025-12-16 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/disserji1/OpenThoughts-Agent-v1-RL
下载链接
链接失效反馈
官方服务:
资源简介:
OpenThoughts-Agent-v1-RL 是一个精心策划的强化学习数据集,包含约720个任务,每个任务都有指令、环境和验证器,用于代理训练。该数据集用于训练 OpenThinker-Agent-v1 模型的强化学习阶段,任务来源于 nl2bash 验证数据集,并经过三阶段过滤管道以确保质量。数据集还包括监督微调(SFT)轨迹数据集 OpenThoughts-Agent-v1-SFT,包含约15,200条轨迹,来自 nl2bash 和 InferredBugs 两个数据源。

OpenThoughts-Agent-v1-RL is a curated RL dataset of ~720 tasks with instructions, environments, and verifiers for agentic training. It is used in the reinforcement learning stage of training the OpenThinker-Agent-v1 model. The tasks are drawn from the nl2bash verified dataset and have undergone a three-stage filtration pipeline for quality assurance. The dataset also includes the supervised fine-tuning (SFT) trace dataset OpenThoughts-Agent-v1-SFT, containing approximately 15,200 traces from two data sources: nl2bash and InferredBugs.
提供机构:
disserji1
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作