five

yananchen/skillrl_sft_alfworld_prompt_completion

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/yananchen/skillrl_sft_alfworld_prompt_completion
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: prompt dtype: string - name: completion dtype: string splits: - name: train num_bytes: 42191393 num_examples: 7486 download_size: 5617787 dataset_size: 42191393 configs: - config_name: default data_files: - split: train path: data/train-* --- The dataset is from paper `SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning` - Rows: 7486 - Distinct parsed tasks: 237 - Contiguous same-task runs: 499 - Contiguous run length: - min: 4 - mean: 15.0 - max: 38 - Rows with \<action\>...\</action\>: 7486/7486 - Rows where output action exactly appears in the admissible action list: 7486/7486 - Invalid action mismatch count: 0 fine-tuning this dataset for open-sourced LLM such as qwen, via TRL ```bash CUDA_VISIBLE_DEVICES=1 trl sft \ --model_name_or_path Qwen/Qwen3-0.6B \ --dataset_name yananchen/skillrl_sft_alfworld_prompt_completion \ --report_to none \ --learning_rate 1e-4 \ --lr_scheduler_type cosine \ --warmup_steps 0.03 \ --per_device_train_batch_size 4 \ --gradient_accumulation_steps 8 \ --output_dir ~/agents/SkillRL/qwen3_0p6B_sft \ --num_train_epochs 4 \ --save_strategy epoch \ --max_steps -1 \ --gradient_checkpointing \ --logging_strategy epoch \ --packing False \ --do_eval False \ --bf16 True \ --dtype bfloat16 \ --max_length 2048 \ --use_peft \ --lora_r 16 \ --lora_alpha 16 \ --save_only_model True \ --lora_target_modules v_proj q_proj \ --load_in_8bit \ --attn_implementation sdpa ```
提供机构:
yananchen
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作