erickfm/melee-ranked-replays
收藏Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/erickfm/melee-ranked-replays
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- reinforcement-learning
- other
language:
- en
tags:
- super-smash-bros-melee
- slippi
- gamecube
- behavior-cloning
- replays
pretty_name: Melee Ranked Replays
size_categories:
- 100K<n<1M
---
# Melee Ranked Replays
Anonymized Slippi ranked replays (platinum+) from Super Smash Bros. Melee,
sharded by character and rank pair. Built for behavior-cloning and other
replay-driven ML work on Melee — notably [MIMIC](https://github.com/erickfm/MIMIC).
## Contents
Raw `.slp` files grouped into tarballs by `(character, rank_pair, source_archive)`,
organized into per-character folders:
```
{CHAR}/
{CHAR}_{rank_pair}_a{N}.tar.gz
metadata/
metadata_a{N}.json
```
- **Characters (25):** BOWSER, CPTFALCON, DK, DOC, FALCO, FOX, GAMEANDWATCH,
GANONDORF, ICE_CLIMBERS, JIGGLYPUFF, KIRBY, LINK, LUIGI, MARIO, MARTH,
MEWTWO, NESS, PEACH, PICHU, PIKACHU, ROY, SAMUS, YLINK, YOSHI, ZELDA_SHEIK
(ZELDA and SHEIK collapsed; POPO and NANA collapsed to ICE_CLIMBERS)
- **Rank pairs:** `diamond-diamond`, `diamond-platinum`, `master-diamond`,
`master-master`, `master-platinum`, `platinum-platinum` (6 combos, higher
rank first in mixed pairs)
- **Source archives:** `a1`..`a6`, corresponding to the 6 original
anonymized ranked dumps. Archive suffix exists so incremental uploads
don't collide; if you want "everything Fox at master-master" you pull
every `FOX/FOX_master-master_a*.tar.gz`.
Each shard holds the raw .slp files — no preprocessing, normalization, or
tensorization applied. Use `peppi-py`, `py-slippi`, or `libmelee` to parse.
### Duplication
Each replay is placed into **both players' character buckets** (unless it's
a ditto). A MARTH vs FALCO `diamond-platinum` replay appears in both
`MARTH/MARTH_diamond-platinum_aN.tar.gz` and `FALCO/FALCO_diamond-platinum_aN.tar.gz`.
A FOX ditto only appears once in `FOX/FOX_diamond-diamond_aN.tar.gz`. This means
downloading "all Marth games at master-master" needs only the `MARTH/` folder
(not a join across 25 per-player files), at the cost of ~90% duplication on
the full dataset.
### Metadata
`metadata/metadata_a{N}.json` is a flat JSON list. One entry per **replay**
(not per bucket), schema:
```json
{
"filename": "diamond-diamond-6cf8c1ee745993cefe0c88db.slp",
"p1": "NESS",
"p2": "JIGGLYPUFF",
"rank": "diamond-diamond",
"archive": "3"
}
```
`p1` and `p2` use the same collapsed character names as the folder/bucket
filenames. `archive` is a string.
## Build pipeline
Source: six anonymized ranked archives covering ~850k total replays at
platinum+ rank. Each archive is processed independently by
[`tools/shard_and_upload_ranked.py`](https://github.com/erickfm/MIMIC/blob/main/tools/shard_and_upload_ranked.py)
in the MIMIC repo.
### Per-file work (parallel, one worker per CPU)
For each `.slp` file in an archive:
1. **Read header only** via `peppi_py.read_slippi(path, skip_frames=True)`
— skipping frames makes it fast (ms per file) since we only need the
Start event, not the ~10k frames of gameplay.
2. **Pull the 2 players** out of `game.start.players`, reject if not exactly 2.
3. **Map each player's character int to a name** via a lookup built from
`melee.Character` enum, with two collapses:
- ZELDA (19) and SHEIK (7) → `ZELDA_SHEIK` (same fighter mid-match)
- POPO (10) and NANA (11) → `ICE_CLIMBERS` (two climbers are one unit)
4. **Reject junk characters**: WIREFRAME_MALE/FEMALE, GIGA_BOWSER, SANDBAG,
UNKNOWN — not legal tournament characters; replays featuring them are
debug/test files.
5. **Parse rank from filename** via regex — the `{rank1}-{rank2}` prefix.
Per-file output: `(filename, p1_name, p2_name, rank_pair, error_or_None)`.
### Bucketing
Each successful replay enters up to two buckets keyed by
`(character, rank_pair)`:
- One for player 1's character
- One for player 2's character (**skipped if same char** — no double-counting dittos)
Metadata is a flat list of `{filename, p1, p2, rank, archive}` entries, one
row per replay.
### Tar + upload
Buckets are compressed one at a time (`tarfile` w:gz, compresslevel=6),
uploaded via `huggingface_hub.HfApi.upload_file`, then the local tar is
deleted. Already-uploaded paths are skipped, so the tool is resume-safe.
## Intended use
Training behavior-cloning models on human gameplay (MIMIC, HAL, and similar
projects), replay analysis, frame-data research. If you're building
something on this, drop me a line.
## License
MIT. Replays were originally published by the Slippi/ranked community in
anonymized form; this is a re-sharded redistribution for ML convenience.
提供机构:
erickfm



