erickfm/mimic-melee-subset
收藏Hugging Face2026-03-18 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/erickfm/mimic-melee-subset
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc0-1.0
tags:
- melee
- smash-bros
- slippi
- imitation-learning
- controller-inputs
- fighting-games
- pytorch
pretty_name: MIMIC Melee Subset
size_categories:
- 1K<n<10K
---
# MIMIC Melee Subset
A small subset of [erickfm/mimic-melee](https://huggingface.co/datasets/erickfm/mimic-melee)
for quick experiments and development iteration. Same format, same
preprocessing — just fewer replays.
## Source
Sampled from [slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7)
(~95,102 raw Slippi tournament replays, compiled by **altf4** on the
[Slippi Discord](https://discord.gg/slippi), CC0 licensed).
~1,000 replays converted to parquet via
[slippi-frame-extractor](https://github.com/erickfm/slippi-frame-extractor),
then tensorized via `tools/upload_dataset.py`. Each replay is tensorized from
both players' perspectives (1,000 replays x 2 = 2,000 games).
## Dataset statistics
| Split | Games | Frames | Shards |
|-------|-------|--------|--------|
| Train | 1,800 | 16,870,729 | 12 |
| Val | 200 | 1,860,593 | 2 |
| **Total** | **2,000** | **18,731,322** | **14** |
- **Total size:** 26.7 GB
- **Shard size:** ~1.9 GB each
- **Val split:** 10% (seed 42)
- **Format:** Per-game concatenated tensors with offset arrays for dynamic windowing
## Shard format
Each `.pt` file contains a dict:
```python
{
"states": {feature_name: Tensor}, # normalized game-state features
"targets": {head_name: Tensor}, # controller-input targets
"offsets": [int, ...], # game boundary indices along time axis
"n_games": int, # number of games in this shard
}
```
## Metadata files
| File | Description |
|------|-------------|
| `tensor_manifest.json` | Shard list, game counts, frame counts, train/val split |
| `norm_stats.json` | Per-column mean and standard deviation |
| `cat_maps.json` | Dynamic categorical mappings |
| `stick_clusters.json` | K-means cluster centers for stick positions and shoulder triggers |
## Usage
```bash
git clone https://github.com/erickfm/MIMIC && cd MIMIC
bash setup.sh --repo erickfm/mimic-melee-subset --data-dir data/subset --run --model tiny
```
Or manually:
```python
from huggingface_hub import snapshot_download
snapshot_download("erickfm/mimic-melee-subset", repo_type="dataset", local_dir="data/subset")
```
For the full dataset (~94k replays, 2.59 TB), see
[erickfm/mimic-melee](https://huggingface.co/datasets/erickfm/mimic-melee).
## Related
- [MIMIC](https://github.com/erickfm/MIMIC) — Imitation-learning bot trained on this data
- [erickfm/mimic-melee](https://huggingface.co/datasets/erickfm/mimic-melee) — Full dataset
- [slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7) — Raw source replays
- [slippi-frame-extractor](https://github.com/erickfm/slippi-frame-extractor) — .slp to parquet converter
## License
CC0 1.0 — Public domain.
提供机构:
erickfm



