erickfm/mimic-melee-subset

Name: erickfm/mimic-melee-subset
Creator: erickfm
Published: 2026-03-18 20:26:19
License: 暂无描述

Hugging Face2026-03-18 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/erickfm/mimic-melee-subset

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en license: cc0-1.0 tags: - melee - smash-bros - slippi - imitation-learning - controller-inputs - fighting-games - pytorch pretty_name: MIMIC Melee Subset size_categories: - 1K<n<10K --- # MIMIC Melee Subset A small subset of [erickfm/mimic-melee](https://huggingface.co/datasets/erickfm/mimic-melee) for quick experiments and development iteration. Same format, same preprocessing — just fewer replays. ## Source Sampled from [slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7) (~95,102 raw Slippi tournament replays, compiled by **altf4** on the [Slippi Discord](https://discord.gg/slippi), CC0 licensed). ~1,000 replays converted to parquet via [slippi-frame-extractor](https://github.com/erickfm/slippi-frame-extractor), then tensorized via `tools/upload_dataset.py`. Each replay is tensorized from both players' perspectives (1,000 replays x 2 = 2,000 games). ## Dataset statistics | Split | Games | Frames | Shards | |-------|-------|--------|--------| | Train | 1,800 | 16,870,729 | 12 | | Val | 200 | 1,860,593 | 2 | | **Total** | **2,000** | **18,731,322** | **14** | - **Total size:** 26.7 GB - **Shard size:** ~1.9 GB each - **Val split:** 10% (seed 42) - **Format:** Per-game concatenated tensors with offset arrays for dynamic windowing ## Shard format Each `.pt` file contains a dict: ```python { "states": {feature_name: Tensor}, # normalized game-state features "targets": {head_name: Tensor}, # controller-input targets "offsets": [int, ...], # game boundary indices along time axis "n_games": int, # number of games in this shard } ``` ## Metadata files | File | Description | |------|-------------| | `tensor_manifest.json` | Shard list, game counts, frame counts, train/val split | | `norm_stats.json` | Per-column mean and standard deviation | | `cat_maps.json` | Dynamic categorical mappings | | `stick_clusters.json` | K-means cluster centers for stick positions and shoulder triggers | ## Usage ```bash git clone https://github.com/erickfm/MIMIC && cd MIMIC bash setup.sh --repo erickfm/mimic-melee-subset --data-dir data/subset --run --model tiny ``` Or manually: ```python from huggingface_hub import snapshot_download snapshot_download("erickfm/mimic-melee-subset", repo_type="dataset", local_dir="data/subset") ``` For the full dataset (~94k replays, 2.59 TB), see [erickfm/mimic-melee](https://huggingface.co/datasets/erickfm/mimic-melee). ## Related - [MIMIC](https://github.com/erickfm/MIMIC) — Imitation-learning bot trained on this data - [erickfm/mimic-melee](https://huggingface.co/datasets/erickfm/mimic-melee) — Full dataset - [slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7) — Raw source replays - [slippi-frame-extractor](https://github.com/erickfm/slippi-frame-extractor) — .slp to parquet converter ## License CC0 1.0 — Public domain.

提供机构：

erickfm

5,000+

优质数据集

54 个

任务类型

进入经典数据集