five

erickfm/melee-ranked-replays

收藏
Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/erickfm/melee-ranked-replays
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - reinforcement-learning - other language: - en tags: - super-smash-bros-melee - slippi - gamecube - behavior-cloning - replays pretty_name: Melee Ranked Replays size_categories: - 100K<n<1M --- # Melee Ranked Replays Anonymized Slippi ranked replays (platinum+) from Super Smash Bros. Melee, sharded by character and rank pair. Built for behavior-cloning and other replay-driven ML work on Melee — notably [MIMIC](https://github.com/erickfm/MIMIC). ## Contents Raw `.slp` files grouped into tarballs by `(character, rank_pair, source_archive)`, organized into per-character folders: ``` {CHAR}/ {CHAR}_{rank_pair}_a{N}.tar.gz metadata/ metadata_a{N}.json ``` - **Characters (25):** BOWSER, CPTFALCON, DK, DOC, FALCO, FOX, GAMEANDWATCH, GANONDORF, ICE_CLIMBERS, JIGGLYPUFF, KIRBY, LINK, LUIGI, MARIO, MARTH, MEWTWO, NESS, PEACH, PICHU, PIKACHU, ROY, SAMUS, YLINK, YOSHI, ZELDA_SHEIK (ZELDA and SHEIK collapsed; POPO and NANA collapsed to ICE_CLIMBERS) - **Rank pairs:** `diamond-diamond`, `diamond-platinum`, `master-diamond`, `master-master`, `master-platinum`, `platinum-platinum` (6 combos, higher rank first in mixed pairs) - **Source archives:** `a1`..`a6`, corresponding to the 6 original anonymized ranked dumps. Archive suffix exists so incremental uploads don't collide; if you want "everything Fox at master-master" you pull every `FOX/FOX_master-master_a*.tar.gz`. Each shard holds the raw .slp files — no preprocessing, normalization, or tensorization applied. Use `peppi-py`, `py-slippi`, or `libmelee` to parse. ### Duplication Each replay is placed into **both players' character buckets** (unless it's a ditto). A MARTH vs FALCO `diamond-platinum` replay appears in both `MARTH/MARTH_diamond-platinum_aN.tar.gz` and `FALCO/FALCO_diamond-platinum_aN.tar.gz`. A FOX ditto only appears once in `FOX/FOX_diamond-diamond_aN.tar.gz`. This means downloading "all Marth games at master-master" needs only the `MARTH/` folder (not a join across 25 per-player files), at the cost of ~90% duplication on the full dataset. ### Metadata `metadata/metadata_a{N}.json` is a flat JSON list. One entry per **replay** (not per bucket), schema: ```json { "filename": "diamond-diamond-6cf8c1ee745993cefe0c88db.slp", "p1": "NESS", "p2": "JIGGLYPUFF", "rank": "diamond-diamond", "archive": "3" } ``` `p1` and `p2` use the same collapsed character names as the folder/bucket filenames. `archive` is a string. ## Build pipeline Source: six anonymized ranked archives covering ~850k total replays at platinum+ rank. Each archive is processed independently by [`tools/shard_and_upload_ranked.py`](https://github.com/erickfm/MIMIC/blob/main/tools/shard_and_upload_ranked.py) in the MIMIC repo. ### Per-file work (parallel, one worker per CPU) For each `.slp` file in an archive: 1. **Read header only** via `peppi_py.read_slippi(path, skip_frames=True)` — skipping frames makes it fast (ms per file) since we only need the Start event, not the ~10k frames of gameplay. 2. **Pull the 2 players** out of `game.start.players`, reject if not exactly 2. 3. **Map each player's character int to a name** via a lookup built from `melee.Character` enum, with two collapses: - ZELDA (19) and SHEIK (7) → `ZELDA_SHEIK` (same fighter mid-match) - POPO (10) and NANA (11) → `ICE_CLIMBERS` (two climbers are one unit) 4. **Reject junk characters**: WIREFRAME_MALE/FEMALE, GIGA_BOWSER, SANDBAG, UNKNOWN — not legal tournament characters; replays featuring them are debug/test files. 5. **Parse rank from filename** via regex — the `{rank1}-{rank2}` prefix. Per-file output: `(filename, p1_name, p2_name, rank_pair, error_or_None)`. ### Bucketing Each successful replay enters up to two buckets keyed by `(character, rank_pair)`: - One for player 1's character - One for player 2's character (**skipped if same char** — no double-counting dittos) Metadata is a flat list of `{filename, p1, p2, rank, archive}` entries, one row per replay. ### Tar + upload Buckets are compressed one at a time (`tarfile` w:gz, compresslevel=6), uploaded via `huggingface_hub.HfApi.upload_file`, then the local tar is deleted. Already-uploaded paths are skipped, so the tool is resume-safe. ## Intended use Training behavior-cloning models on human gameplay (MIMIC, HAL, and similar projects), replay analysis, frame-data research. If you're building something on this, drop me a line. ## License MIT. Replays were originally published by the Slippi/ranked community in anonymized form; this is a re-sharded redistribution for ML convenience.
提供机构:
erickfm
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作