five

Fraser/pico-8-games

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Fraser/pico-8-games
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-sa-4.0 task_categories: - text-generation - image-to-text language: - en tags: - pico-8 - games - pixel-art - lua - code-generation - retro - fantasy-console - sprites - chiptune - game-development pretty_name: "PICO-8 Games" size_categories: - 10K<n<100K --- # PICO-8 Games Dataset <p align="center"> <img src="showcase.gif" alt="PICO-8 Games Showcase" width="384"> </p> **The first multimodal dataset of PICO-8 games.** 10,967 cartridges scraped from the [Lexaloffle BBS](https://www.lexaloffle.com/bbs/?cat=7#sub=2), each decomposed into Lua source code, pixel-art spritesheets, tile maps, sound effects, music patterns, and metadata. <p align="center"> <img src="hero_grid.png" alt="Top 48 PICO-8 games by star count" width="800"> <br> <em>Label screenshots from the top 48 games by star count</em> </p> ## What's Inside Every PICO-8 cartridge is a self-contained game packed into a single file. This dataset cracks each one open into its component parts: <p align="center"> <img src="modalities.png" alt="Cart modalities breakdown" width="800"> <br> <em>The anatomy of a PICO-8 cartridge (Celeste by Matt Thorson & Noel Berry)</em> </p> | Modality | Format | Description | |----------|--------|-------------| | **Lua source code** | String (up to 65,535 chars) | The complete game logic in PICO-8's Lua dialect | | **Spritesheet** | 128x128 PNG (16 colors) | Up to 256 8x8 sprites — all the game's pixel art | | **Label image** | 128x128 PNG | The developer's chosen screenshot representing the game | | **Tile map** | 128x32 grid of tile indices | The spatial layout of game levels | | **Sound effects** | 64 SFX slots, hex-encoded | Each SFX has 32 notes with pitch, waveform, volume, and effects | | **Music patterns** | 64 patterns, hex-encoded | 4-channel music sequencer data referencing SFX slots | | **Sprite flags** | 256 bytes | 8 boolean flags per sprite, used for collision/behavior | ## Dataset Schema ```python { # --- Metadata --- "cart_id": str, # Unique cart identifier (lid) "title": str, # Game title "author": str, # Author username "description": str, # Author's description (when available) "tags": list[str], # User-applied tags ("platformer", "shooter", etc.) "stars": int, # Community star/like count (primary quality signal) "reply_count": int, # Number of thread replies (engagement signal) "date_posted": str, # Original post date (YYYY-MM-DD HH:MM:SS) "date_updated": str, # Last update date "license": str, # "CC4-BY-NC-SA" or "" (71% are CC4-licensed) "thread_id": int, # BBS thread ID "thread_url": str, # Direct link to the BBS thread # --- Code --- "lua_code": str, # Complete Lua source code "token_count": int, # Approximate PICO-8 token count "char_count": int, # Character count of Lua code # --- Quality flags --- "has_init": bool, # Has _init() function "has_update": bool, # Has _update() or _update60() function "has_draw": bool, # Has _draw() function "is_duplicate": bool, # Exact code duplicate of a higher-starred cart # --- Visual data --- "spritesheet": Image, # 128x128 RGB spritesheet (PICO-8's 16-color palette) "label_image": Image, # 128x128 cart label screenshot (if captured by dev) "map_image": Image, # Rendered tile map using spritesheet (if map data exists) # --- Raw data (for lossless roundtripping) --- "gfx_hex": str, # Raw spritesheet as hex digits "map_data": list[int], # Raw tile indices (uint8) "sprite_flags": list[int], # Raw sprite flags (uint8) "sfx_hex": str, # Raw sound effect data "music_hex": str, # Raw music pattern data } ``` ## Quick Start ```python from datasets import load_dataset ds = load_dataset("Fraser/pico-8-games") # Browse the top games top = ds["train"].sort("stars", reverse=True) for row in top.select(range(10)): print(f"{row['title']:30s} {row['stars']:4d} stars by {row['author']}") # Filter to high-quality, unique, CC4-licensed games hq = ds["train"].filter(lambda x: ( x["stars"] >= 10 and not x["is_duplicate"] and x["has_init"] and x["has_update"] and x["has_draw"] and x["license"] == "CC4-BY-NC-SA" )) print(f"High-quality subset: {len(hq)} games") ``` ## Running a Cart PICO-8 carts are playable games. To run one: **1. In the browser** — Visit the `thread_url` for any cart and click "Play" on the BBS page. **2. With PICO-8** ($15, [lexaloffle.com](https://www.lexaloffle.com/pico-8.php)) — Reconstruct the `.p8` text file: ```python row = ds["train"][0] # Build the .p8 file from dataset columns p8_text = "pico-8 cartridge // http://www.pico-8.com\nversion 42\n" p8_text += f"__lua__\n{row['lua_code']}\n" p8_text += f"__gfx__\n{row['gfx_hex']}\n" # Add map data (convert byte array back to hex lines) map_bytes = row["map_data"] if map_bytes: p8_text += "__map__\n" for y in range(0, len(map_bytes), 128): line = "".join(f"{b:02x}" for b in map_bytes[y:y+128]) p8_text += line + "\n" # Add sprite flags flags = row["sprite_flags"] if flags: p8_text += "__gff__\n" p8_text += "".join(f"{b:02x}" for b in flags) + "\n" p8_text += f"__sfx__\n{row['sfx_hex']}\n" p8_text += f"__music__\n{row['music_hex']}\n" with open("game.p8", "w") as f: f.write(p8_text) # Then: pico8 -run game.p8 ``` ## Extracting Components ### Render the spritesheet Each cart's spritesheet is a 128x128 image using PICO-8's fixed 16-color palette. It's stored both as a rendered PNG and as raw hex for lossless access. ```python # The spritesheet is already a PIL Image row = ds["train"][0] row["spritesheet"].save("sprites.png") # Or parse individual 8x8 sprites from the hex data gfx = row["gfx_hex"] lines = gfx.strip().split("\n") # Each hex digit = one pixel (4-bit color index) # Sprite 0 is at (0,0), sprite 1 at (8,0), ..., sprite 16 at (0,8) sprite_id = 42 sx = (sprite_id % 16) * 8 sy = (sprite_id // 16) * 8 for y in range(8): row_pixels = lines[sy + y][sx:sx + 8] print(row_pixels) # e.g. "00770700" — each digit is a color index ``` ### Extract the tile map The map is a grid of tile indices referencing sprites from the spritesheet: ```python import numpy as np from PIL import Image PICO8_PALETTE = [ (0,0,0), (29,43,83), (126,37,83), (0,135,81), (171,82,54), (95,87,79), (194,195,199), (255,241,232), (255,0,77), (255,163,0), (255,236,39), (0,228,54), (41,173,255), (131,118,156), (255,119,168), (255,204,170), ] row = ds["train"][0] map_data = row["map_data"] # flat list of uint8 tile indices # Reshape to 32 rows x 128 columns (some carts use 64 rows, sharing sprite memory) width = 128 height = len(map_data) // width tiles = np.array(map_data).reshape(height, width) print(f"Map size: {width}x{height} tiles = {width*8}x{height*8} pixels") print(f"Unique tiles used: {len(np.unique(tiles))}") ``` ### Parse sound effects Each SFX slot contains 32 notes. The hex format encodes pitch, waveform, volume, and effect per note: ```python row = ds["train"][0] sfx_lines = row["sfx_hex"].strip().split("\n") for i, line in enumerate(sfx_lines[:4]): # First 4 SFX # First 2 chars: editor mode, next 2: speed, next 2: loop start, next 2: loop end header = line[:8] notes = line[8:] speed = int(header[2:4], 16) print(f"SFX {i}: speed={speed}, {len(notes)//5} notes") # Each note is 5 hex chars: pitch(2) waveform(1) volume(1) effect(1) ``` ### Parse music patterns Music patterns sequence up to 4 SFX channels together: ```python row = ds["train"][0] music_lines = row["music_hex"].strip().split("\n") for i, line in enumerate(music_lines[:8]): # Each line: 2 flag chars + 4 channel values (2 hex chars each) flags = int(line[0:2], 16) channels = [int(line[2+j*2:4+j*2], 16) for j in range(4)] # Bit 0-2 of each channel byte: loop flags; bits 6+: SFX index (0-63, 65+ = off) ch_display = [f"sfx {c & 0x3f}" if c < 65 else "---" for c in channels] print(f"Pattern {i}: {' | '.join(ch_display)} flags={flags:08b}") ``` ## Quality Distribution <p align="center"> <img src="quality_analysis.png" alt="Quality analysis plots" width="900"> </p> | Star Tier | Count | % | |-----------|------:|---:| | 0 stars | 858 | 7.8% | | 1-4 stars | 4,049 | 36.8% | | 5-9 stars | 2,653 | 24.1% | | 10-49 stars | 2,868 | 26.0% | | 50-99 stars | 206 | 1.9% | | 100-199 stars | 246 | 2.2% | | 200+ stars | 133 | 1.2% | **Suggested quality tiers for downstream use:** - **Featured** (200+ stars): 133 exceptional games — Celeste, POOM, Porklike, etc. - **High quality** (50+ stars): 585 polished, community-recognized games - **Solid** (10+ stars): 3,453 games that found an audience - **Full corpus**: 10,967 games including experiments, demos, tools, and art Use `is_duplicate=False` to exclude 53 carts that are exact code copies of higher-starred originals (mostly Celeste/Jelpi mods). ## About PICO-8 [PICO-8](https://www.lexaloffle.com/pico-8.php) is a fantasy console by [Lexaloffle](https://www.lexaloffle.com/) — a deliberately constrained environment for making tiny games: - **Display**: 128x128 pixels, 16 fixed colors - **Code**: Lua subset, max 8,192 tokens / 65,535 characters - **Sprites**: 256 8x8 sprites on a 128x128 sheet - **Map**: 128x32 tiles (or 128x64 sharing sprite memory) - **Sound**: 64 SFX slots, 64 music patterns, 4 channels - **Input**: 6 buttons (directional pad + O/X) Every game's complete source code, art, sound, and music fits in a single `.p8.png` file — a 160x205 PNG with data steganographically encoded in the least significant bits. ## How This Dataset Was Built 1. **Index**: Scraped all ~368 pages of the [PICO-8 BBS Releases](https://www.lexaloffle.com/bbs/?cat=7#sub=2) category 2. **Download**: Fetched 10,999 `.p8.png` cart files (14 were unavailable) 3. **Parse**: Converted each cart to `.p8` text format using [shrinko8](https://github.com/thisismypassport/shrinko8), then extracted all sections 4. **Render**: Generated spritesheet and map PNGs using the PICO-8 palette 5. **Quality**: Computed entry point flags, duplicate detection, and star/engagement metadata Data was collected in March 2026. The scraper respected rate limits (1.5s between requests) and identified itself via User-Agent. ## Citation ```bibtex @dataset{pico8games2026, title={PICO-8 Games Dataset}, author={Fraser Greenlee}, year={2026}, url={https://huggingface.co/datasets/Fraser/pico-8-games}, note={10,967 PICO-8 cartridges from the Lexaloffle BBS} } ``` ## License 71% of carts in this dataset are released under [CC4-BY-NC-SA](https://creativecommons.org/licenses/by-nc-sa/4.0/) by their authors. The remaining 29% have no explicit license specified. Filter on the `license` column for your use case. The dataset metadata and tooling are released under CC-BY-NC-SA-4.0.
提供机构:
Fraser
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作