yananchen/skillrl_sft_alfworld_prompt_completion
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/yananchen/skillrl_sft_alfworld_prompt_completion
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: prompt
dtype: string
- name: completion
dtype: string
splits:
- name: train
num_bytes: 42191393
num_examples: 7486
download_size: 5617787
dataset_size: 42191393
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
The dataset is from paper `SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning`
- Rows: 7486
- Distinct parsed tasks: 237
- Contiguous same-task runs: 499
- Contiguous run length:
- min: 4
- mean: 15.0
- max: 38
- Rows with \<action\>...\</action\>: 7486/7486
- Rows where output action exactly appears in the admissible action list: 7486/7486
- Invalid action mismatch count: 0
fine-tuning this dataset for open-sourced LLM such as qwen, via TRL
```bash
CUDA_VISIBLE_DEVICES=1 trl sft \
--model_name_or_path Qwen/Qwen3-0.6B \
--dataset_name yananchen/skillrl_sft_alfworld_prompt_completion \
--report_to none \
--learning_rate 1e-4 \
--lr_scheduler_type cosine \
--warmup_steps 0.03 \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 8 \
--output_dir ~/agents/SkillRL/qwen3_0p6B_sft \
--num_train_epochs 4 \
--save_strategy epoch \
--max_steps -1 \
--gradient_checkpointing \
--logging_strategy epoch \
--packing False \
--do_eval False \
--bf16 True \
--dtype bfloat16 \
--max_length 2048 \
--use_peft \
--lora_r 16 \
--lora_alpha 16 \
--save_only_model True \
--lora_target_modules v_proj q_proj \
--load_in_8bit \
--attn_implementation sdpa
```
提供机构:
yananchen



