five

erickfm/mimic-melee-subset

收藏
Hugging Face2026-03-18 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/erickfm/mimic-melee-subset
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: cc0-1.0 tags: - melee - smash-bros - slippi - imitation-learning - controller-inputs - fighting-games - pytorch pretty_name: MIMIC Melee Subset size_categories: - 1K<n<10K --- # MIMIC Melee Subset A small subset of [erickfm/mimic-melee](https://huggingface.co/datasets/erickfm/mimic-melee) for quick experiments and development iteration. Same format, same preprocessing — just fewer replays. ## Source Sampled from [slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7) (~95,102 raw Slippi tournament replays, compiled by **altf4** on the [Slippi Discord](https://discord.gg/slippi), CC0 licensed). ~1,000 replays converted to parquet via [slippi-frame-extractor](https://github.com/erickfm/slippi-frame-extractor), then tensorized via `tools/upload_dataset.py`. Each replay is tensorized from both players' perspectives (1,000 replays x 2 = 2,000 games). ## Dataset statistics | Split | Games | Frames | Shards | |-------|-------|--------|--------| | Train | 1,800 | 16,870,729 | 12 | | Val | 200 | 1,860,593 | 2 | | **Total** | **2,000** | **18,731,322** | **14** | - **Total size:** 26.7 GB - **Shard size:** ~1.9 GB each - **Val split:** 10% (seed 42) - **Format:** Per-game concatenated tensors with offset arrays for dynamic windowing ## Shard format Each `.pt` file contains a dict: ```python { "states": {feature_name: Tensor}, # normalized game-state features "targets": {head_name: Tensor}, # controller-input targets "offsets": [int, ...], # game boundary indices along time axis "n_games": int, # number of games in this shard } ``` ## Metadata files | File | Description | |------|-------------| | `tensor_manifest.json` | Shard list, game counts, frame counts, train/val split | | `norm_stats.json` | Per-column mean and standard deviation | | `cat_maps.json` | Dynamic categorical mappings | | `stick_clusters.json` | K-means cluster centers for stick positions and shoulder triggers | ## Usage ```bash git clone https://github.com/erickfm/MIMIC && cd MIMIC bash setup.sh --repo erickfm/mimic-melee-subset --data-dir data/subset --run --model tiny ``` Or manually: ```python from huggingface_hub import snapshot_download snapshot_download("erickfm/mimic-melee-subset", repo_type="dataset", local_dir="data/subset") ``` For the full dataset (~94k replays, 2.59 TB), see [erickfm/mimic-melee](https://huggingface.co/datasets/erickfm/mimic-melee). ## Related - [MIMIC](https://github.com/erickfm/MIMIC) — Imitation-learning bot trained on this data - [erickfm/mimic-melee](https://huggingface.co/datasets/erickfm/mimic-melee) — Full dataset - [slippi-public-dataset-v3.7](https://huggingface.co/datasets/erickfm/slippi-public-dataset-v3.7) — Raw source replays - [slippi-frame-extractor](https://github.com/erickfm/slippi-frame-extractor) — .slp to parquet converter ## License CC0 1.0 — Public domain.
提供机构:
erickfm
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作