disserji1/OpenThoughts-Agent-v1-RL
收藏Hugging Face2025-12-16 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/disserji1/OpenThoughts-Agent-v1-RL
下载链接
链接失效反馈官方服务:
资源简介:
OpenThoughts-Agent-v1-RL 是一个精心策划的强化学习数据集,包含约720个任务,每个任务都有指令、环境和验证器,用于代理训练。该数据集用于训练 OpenThinker-Agent-v1 模型的强化学习阶段,任务来源于 nl2bash 验证数据集,并经过三阶段过滤管道以确保质量。数据集还包括监督微调(SFT)轨迹数据集 OpenThoughts-Agent-v1-SFT,包含约15,200条轨迹,来自 nl2bash 和 InferredBugs 两个数据源。
OpenThoughts-Agent-v1-RL is a curated RL dataset of ~720 tasks with instructions, environments, and verifiers for agentic training. It is used in the reinforcement learning stage of training the OpenThinker-Agent-v1 model. The tasks are drawn from the nl2bash verified dataset and have undergone a three-stage filtration pipeline for quality assurance. The dataset also includes the supervised fine-tuning (SFT) trace dataset OpenThoughts-Agent-v1-SFT, containing approximately 15,200 traces from two data sources: nl2bash and InferredBugs.
提供机构:
disserji1



