IntelligenceLab/Cos-Play-Cold-Start

Name: IntelligenceLab/Cos-Play-Cold-Start
Creator: IntelligenceLab
Published: 2026-04-06 21:30:38
License: 暂无描述

Hugging Face2026-04-06 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/IntelligenceLab/Cos-Play-Cold-Start

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - reinforcement-learning language: - en tags: - game-playing - llm-agent - cold-start - skill-labeling - grpo pretty_name: "COS-PLAY Cold-Start Data" size_categories: - 10K<n<100K configs: - config_name: episodes_twenty_forty_eight data_files: "data/episodes/twenty_forty_eight.jsonl" - config_name: episodes_tetris data_files: "data/episodes/tetris.jsonl" - config_name: episodes_candy_crush data_files: "data/episodes/candy_crush.jsonl" - config_name: episodes_super_mario data_files: "data/episodes/super_mario.jsonl" - config_name: episodes_sokoban data_files: "data/episodes/sokoban.jsonl" - config_name: episodes_pokemon_red data_files: "data/episodes/pokemon_red.jsonl" - config_name: episodes_avalon data_files: "data/episodes/avalon.jsonl" - config_name: episodes_diplomacy data_files: "data/episodes/diplomacy.jsonl" - config_name: grpo_action_taking_twenty_forty_eight data_files: "data/grpo_coldstart/twenty_forty_eight/action_taking.jsonl" - config_name: grpo_action_taking_tetris data_files: "data/grpo_coldstart/tetris/action_taking.jsonl" - config_name: grpo_action_taking_candy_crush data_files: "data/grpo_coldstart/candy_crush/action_taking.jsonl" - config_name: grpo_action_taking_super_mario data_files: "data/grpo_coldstart/super_mario/action_taking.jsonl" - config_name: grpo_action_taking_sokoban data_files: "data/grpo_coldstart/sokoban/action_taking.jsonl" - config_name: grpo_action_taking_pokemon_red data_files: "data/grpo_coldstart/pokemon_red/action_taking.jsonl" - config_name: grpo_skill_selection_twenty_forty_eight data_files: "data/grpo_coldstart/twenty_forty_eight/skill_selection.jsonl" - config_name: grpo_skill_selection_tetris data_files: "data/grpo_coldstart/tetris/skill_selection.jsonl" - config_name: grpo_skill_selection_candy_crush data_files: "data/grpo_coldstart/candy_crush/skill_selection.jsonl" - config_name: grpo_skill_selection_super_mario data_files: "data/grpo_coldstart/super_mario/skill_selection.jsonl" - config_name: grpo_skill_selection_sokoban data_files: "data/grpo_coldstart/sokoban/skill_selection.jsonl" - config_name: grpo_skill_selection_pokemon_red data_files: "data/grpo_coldstart/pokemon_red/skill_selection.jsonl" --- # COS-PLAY Cold-Start Data Pre-generated cold-start data for [COS-PLAY](https://github.com/wuxiyang1996/cos-play) (COLM 2026): **Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Game Play**. ## Dataset Summary This dataset contains GPT-5.4-generated seed trajectories and skill-labeled episodes for 8 games, used to bootstrap the COS-PLAY co-evolution training loop. | Game | Episodes | Steps (action) | Steps (skill) | |------|----------|-----------------|----------------| | 2048 | 60 | 8,125 | varies | | Tetris | 60 | 3,700 | varies | | Candy Crush | 60 | 3,000 | varies | | Super Mario | 60 | 3,043 | varies | | Sokoban | 59 | 5,204 | varies | | Pokemon Red | 60 | 11,552 | varies | | Avalon | 60 | — | — | | Diplomacy | 60 | — | — | ## Dataset Structure ### Episodes (`data/episodes/<game>.jsonl`) Each line is a full episode with fields: - `episode_id` — unique episode identifier - `game_name` — game name - `experiences` — list of step-level data, each containing: - `state`, `action`, `reward`, `next_state`, `done` - `summary_state` — structured state summary - `intentions` — agent's declared intention at the step - `available_actions` — list of legal actions ### GRPO Cold-Start (`data/grpo_coldstart/<game>/`) Training data for GRPO LoRA fine-tuning of the decision agent: - **`action_taking.jsonl`** — one row per step: state + actions → chosen action - **`skill_selection.jsonl`** — one row per step with ≥2 skill candidates: state + candidates → chosen skill Fields: `type`, `game`, `episode`, `step`, `prompt`, `chosen`, `rejected` ## Usage ### Download with Python ```python from huggingface_hub import snapshot_download snapshot_download( repo_id="IntelligenceLab/Cos-Play-Cold-Start", repo_type="dataset", local_dir="labeling/output/gpt54_skill_labeled", ) ``` ### Download with CLI ```bash pip install huggingface_hub huggingface-cli download IntelligenceLab/Cos-Play-Cold-Start \ --repo-type dataset \ --local-dir labeling/output/gpt54_skill_labeled ``` ### Load with `datasets` ```python from datasets import load_dataset # Load episodes for a specific game ds = load_dataset("IntelligenceLab/Cos-Play-Cold-Start", "episodes_tetris") # Load GRPO action-taking data ds = load_dataset("IntelligenceLab/Cos-Play-Cold-Start", "grpo_action_taking_tetris") ``` ## Citation ```bibtex @inproceedings{cosplay2026, title={COS-PLAY: Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Game Play}, author={...}, booktitle={COLM}, year={2026} } ```

提供机构：

IntelligenceLab

5,000+

优质数据集

54 个

任务类型

进入经典数据集