yananchen/skillrl_sft_alfworld_prompt_completion

Name: yananchen/skillrl_sft_alfworld_prompt_completion
Creator: yananchen
Published: 2026-04-10 01:13:19
License: 暂无描述

Hugging Face2026-04-10 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/yananchen/skillrl_sft_alfworld_prompt_completion

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: prompt dtype: string - name: completion dtype: string splits: - name: train num_bytes: 42191393 num_examples: 7486 download_size: 5617787 dataset_size: 42191393 configs: - config_name: default data_files: - split: train path: data/train-* --- The dataset is from paper `SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning` - Rows: 7486 - Distinct parsed tasks: 237 - Contiguous same-task runs: 499 - Contiguous run length: - min: 4 - mean: 15.0 - max: 38 - Rows with \<action\>...\</action\>: 7486/7486 - Rows where output action exactly appears in the admissible action list: 7486/7486 - Invalid action mismatch count: 0 fine-tuning this dataset for open-sourced LLM such as qwen, via TRL ```bash CUDA_VISIBLE_DEVICES=1 trl sft \ --model_name_or_path Qwen/Qwen3-0.6B \ --dataset_name yananchen/skillrl_sft_alfworld_prompt_completion \ --report_to none \ --learning_rate 1e-4 \ --lr_scheduler_type cosine \ --warmup_steps 0.03 \ --per_device_train_batch_size 4 \ --gradient_accumulation_steps 8 \ --output_dir ~/agents/SkillRL/qwen3_0p6B_sft \ --num_train_epochs 4 \ --save_strategy epoch \ --max_steps -1 \ --gradient_checkpointing \ --logging_strategy epoch \ --packing False \ --do_eval False \ --bf16 True \ --dtype bfloat16 \ --max_length 2048 \ --use_peft \ --lora_r 16 \ --lora_alpha 16 \ --save_only_model True \ --lora_target_modules v_proj q_proj \ --load_in_8bit \ --attn_implementation sdpa ```

提供机构：

yananchen

5,000+

优质数据集

54 个

任务类型

进入经典数据集