Sizzing/aws-rl-sft
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Sizzing/aws-rl-sft
下载链接
链接失效反馈官方服务:
资源简介:
AWS RL Env — SFT数据集是一个监督微调数据集,用于训练通过CLI操作AWS基础设施的大型语言模型(LLM)代理。该数据集是为aws-rl-env强化学习环境构建的,该环境在容器中模拟34个AWS服务(MiniStack),并通过单命令步骤奖励代理完成云操作任务。数据集是SFT → GRPO管道的冷启动阶段的一部分,包括使用LoRA的SFT和基于课程的GRPO。数据集的模式包括task_id、difficulty、source、step_idx和messages。数据集由各种来源和层级组成,分为训练、验证和保留集。README还提供了使用`trl.SFTTrainer`和LoRA加载、过滤和训练数据集的快速入门指南。数据集是完全合成的,不需要教师LLM,包括失败恢复行和提示变化。许可证为Apache 2.0。
The AWS RL Env — SFT Dataset is a supervised fine-tuning dataset for training a Large Language Model (LLM) agent that operates AWS infrastructure via the CLI. Built for the aws-rl-env reinforcement-learning environment, which emulates 34 AWS services in-container (MiniStack) and rewards agents for completing cloud-operations tasks via single-command steps. The dataset is designed as the cold-start phase of an SFT → GRPO pipeline, including SFT with LoRA and GRPO on curriculum. The schema includes task_id, difficulty, source, step_idx, and messages. The dataset is composed of various sources and tiers, with splits for training, validation, and reserve. The README provides quickstart instructions for loading, filtering, and training with the dataset using `trl.SFTTrainer` and LoRA. The dataset is fully synthetic, with no teacher LLM required, and includes failure-recovery rows and prompt variance. The license is Apache 2.0.
提供机构:
Sizzing



