erickfm/mimic-melee
收藏Hugging Face2026-03-18 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/erickfm/mimic-melee
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc0-1.0
tags:
- melee
- smash-bros
- slippi
- imitation-learning
- controller-inputs
- fighting-games
- pytorch
pretty_name: MIMIC Melee
size_categories:
- 100K<n<1M
---
# MIMIC Melee
Pretokenized tensor shards for training [MIMIC](https://github.com/erickfm/MIMIC),
an imitation-learning bot for Super Smash Bros. Melee. Each shard is a
ready-to-train PyTorch file containing normalized game-state features and
controller-input targets — no preprocessing needed at load time.
## Source
Built from [slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7)
(~95,102 raw Slippi tournament replays, compiled by **altf4** on the
[Slippi Discord](https://discord.gg/slippi), CC0 licensed).
Raw `.slp` replays are converted to per-frame parquet files using
[slippi-frame-extractor](https://github.com/erickfm/slippi-frame-extractor),
then tensorized and uploaded via `tools/upload_dataset.py` in streaming mode
with 64 multiprocessing workers.
### Replay to game expansion
Each replay contains two players. Every replay is tensorized from **both
players' perspectives**, doubling the training data:
| | Replays | Games (2x perspectives) |
|---|---|---|
| Train | ~84,788 | 169,575 |
| Val | ~9,421 | 18,841 |
| **Total** | **~94,208** | **188,416** |
## Dataset statistics
| Split | Games | Frames | Shards |
|-------|-------|--------|--------|
| Train | 169,575 | 1,631,777,124 | 582 |
| Val | 18,841 | 180,971,668 | 65 |
| **Total** | **188,416** | **1,812,748,792** | **647** |
- **Total size:** 2.59 TB
- **Shard size:** ~4 GB each
- **Val split:** 10% (seed 42)
- **Format:** Per-game concatenated tensors with offset arrays for dynamic windowing
## Shard format
Each `.pt` file contains a dict:
```python
{
"states": {feature_name: Tensor}, # normalized game-state features
"targets": {head_name: Tensor}, # controller-input targets
"offsets": [int, ...], # game boundary indices along time axis
"n_games": int, # number of games in this shard
}
```
Multiple games are concatenated along the time axis (axis 0). The `offsets`
array marks where each game begins, enabling dynamic windowing during training
without pre-creating all sliding windows.
## Preprocessing
All preprocessing is baked into the shards:
- **Categorical encoding** via `cat_maps.json` (ports, costumes, action states, projectile subtypes)
- **Normalization** via `norm_stats.json` (per-column z-score standardization)
- **Stick discretization** via `stick_clusters.json` (30 K-means clusters for main stick, 4 bins for L/R triggers)
- **C-stick** encoded as 5-way cardinal direction (neutral/up/down/left/right)
- **Self-controller inputs excluded** — model learns purely from game state, eliminating train/inference distribution shift
## Metadata files
| File | Description |
|------|-------------|
| `tensor_manifest.json` | Shard list, game counts, frame counts, train/val split |
| `norm_stats.json` | Per-column mean and standard deviation |
| `cat_maps.json` | Dynamic categorical mappings |
| `stick_clusters.json` | K-means cluster centers for stick positions and shoulder triggers |
## Usage
```python
from huggingface_hub import snapshot_download
snapshot_download("erickfm/mimic-melee", repo_type="dataset", local_dir="data/full")
```
Or use the one-command setup:
```bash
git clone https://github.com/erickfm/MIMIC && cd MIMIC
bash setup.sh --run
```
For a smaller version for quick experiments, see
[erickfm/mimic-melee-subset](https://huggingface.co/datasets/erickfm/mimic-melee-subset).
## Related
- [MIMIC](https://github.com/erickfm/MIMIC) — Imitation-learning bot trained on this data
- [slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7) — Raw source replays
- [slippi-frame-extractor](https://github.com/erickfm/slippi-frame-extractor) — .slp to parquet converter
- [Slippi](https://slippi.gg/) — Melee netplay client
## License
CC0 1.0 — Public domain.
提供机构:
erickfm



