Fraser/pico-8-games
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Fraser/pico-8-games
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-sa-4.0
task_categories:
- text-generation
- image-to-text
language:
- en
tags:
- pico-8
- games
- pixel-art
- lua
- code-generation
- retro
- fantasy-console
- sprites
- chiptune
- game-development
pretty_name: "PICO-8 Games"
size_categories:
- 10K<n<100K
---
# PICO-8 Games Dataset
<p align="center">
<img src="showcase.gif" alt="PICO-8 Games Showcase" width="384">
</p>
**The first multimodal dataset of PICO-8 games.** 10,967 cartridges scraped from the [Lexaloffle BBS](https://www.lexaloffle.com/bbs/?cat=7#sub=2), each decomposed into Lua source code, pixel-art spritesheets, tile maps, sound effects, music patterns, and metadata.
<p align="center">
<img src="hero_grid.png" alt="Top 48 PICO-8 games by star count" width="800">
<br>
<em>Label screenshots from the top 48 games by star count</em>
</p>
## What's Inside
Every PICO-8 cartridge is a self-contained game packed into a single file. This dataset cracks each one open into its component parts:
<p align="center">
<img src="modalities.png" alt="Cart modalities breakdown" width="800">
<br>
<em>The anatomy of a PICO-8 cartridge (Celeste by Matt Thorson & Noel Berry)</em>
</p>
| Modality | Format | Description |
|----------|--------|-------------|
| **Lua source code** | String (up to 65,535 chars) | The complete game logic in PICO-8's Lua dialect |
| **Spritesheet** | 128x128 PNG (16 colors) | Up to 256 8x8 sprites — all the game's pixel art |
| **Label image** | 128x128 PNG | The developer's chosen screenshot representing the game |
| **Tile map** | 128x32 grid of tile indices | The spatial layout of game levels |
| **Sound effects** | 64 SFX slots, hex-encoded | Each SFX has 32 notes with pitch, waveform, volume, and effects |
| **Music patterns** | 64 patterns, hex-encoded | 4-channel music sequencer data referencing SFX slots |
| **Sprite flags** | 256 bytes | 8 boolean flags per sprite, used for collision/behavior |
## Dataset Schema
```python
{
# --- Metadata ---
"cart_id": str, # Unique cart identifier (lid)
"title": str, # Game title
"author": str, # Author username
"description": str, # Author's description (when available)
"tags": list[str], # User-applied tags ("platformer", "shooter", etc.)
"stars": int, # Community star/like count (primary quality signal)
"reply_count": int, # Number of thread replies (engagement signal)
"date_posted": str, # Original post date (YYYY-MM-DD HH:MM:SS)
"date_updated": str, # Last update date
"license": str, # "CC4-BY-NC-SA" or "" (71% are CC4-licensed)
"thread_id": int, # BBS thread ID
"thread_url": str, # Direct link to the BBS thread
# --- Code ---
"lua_code": str, # Complete Lua source code
"token_count": int, # Approximate PICO-8 token count
"char_count": int, # Character count of Lua code
# --- Quality flags ---
"has_init": bool, # Has _init() function
"has_update": bool, # Has _update() or _update60() function
"has_draw": bool, # Has _draw() function
"is_duplicate": bool, # Exact code duplicate of a higher-starred cart
# --- Visual data ---
"spritesheet": Image, # 128x128 RGB spritesheet (PICO-8's 16-color palette)
"label_image": Image, # 128x128 cart label screenshot (if captured by dev)
"map_image": Image, # Rendered tile map using spritesheet (if map data exists)
# --- Raw data (for lossless roundtripping) ---
"gfx_hex": str, # Raw spritesheet as hex digits
"map_data": list[int], # Raw tile indices (uint8)
"sprite_flags": list[int], # Raw sprite flags (uint8)
"sfx_hex": str, # Raw sound effect data
"music_hex": str, # Raw music pattern data
}
```
## Quick Start
```python
from datasets import load_dataset
ds = load_dataset("Fraser/pico-8-games")
# Browse the top games
top = ds["train"].sort("stars", reverse=True)
for row in top.select(range(10)):
print(f"{row['title']:30s} {row['stars']:4d} stars by {row['author']}")
# Filter to high-quality, unique, CC4-licensed games
hq = ds["train"].filter(lambda x: (
x["stars"] >= 10
and not x["is_duplicate"]
and x["has_init"] and x["has_update"] and x["has_draw"]
and x["license"] == "CC4-BY-NC-SA"
))
print(f"High-quality subset: {len(hq)} games")
```
## Running a Cart
PICO-8 carts are playable games. To run one:
**1. In the browser** — Visit the `thread_url` for any cart and click "Play" on the BBS page.
**2. With PICO-8** ($15, [lexaloffle.com](https://www.lexaloffle.com/pico-8.php)) — Reconstruct the `.p8` text file:
```python
row = ds["train"][0]
# Build the .p8 file from dataset columns
p8_text = "pico-8 cartridge // http://www.pico-8.com\nversion 42\n"
p8_text += f"__lua__\n{row['lua_code']}\n"
p8_text += f"__gfx__\n{row['gfx_hex']}\n"
# Add map data (convert byte array back to hex lines)
map_bytes = row["map_data"]
if map_bytes:
p8_text += "__map__\n"
for y in range(0, len(map_bytes), 128):
line = "".join(f"{b:02x}" for b in map_bytes[y:y+128])
p8_text += line + "\n"
# Add sprite flags
flags = row["sprite_flags"]
if flags:
p8_text += "__gff__\n"
p8_text += "".join(f"{b:02x}" for b in flags) + "\n"
p8_text += f"__sfx__\n{row['sfx_hex']}\n"
p8_text += f"__music__\n{row['music_hex']}\n"
with open("game.p8", "w") as f:
f.write(p8_text)
# Then: pico8 -run game.p8
```
## Extracting Components
### Render the spritesheet
Each cart's spritesheet is a 128x128 image using PICO-8's fixed 16-color palette. It's stored both as a rendered PNG and as raw hex for lossless access.
```python
# The spritesheet is already a PIL Image
row = ds["train"][0]
row["spritesheet"].save("sprites.png")
# Or parse individual 8x8 sprites from the hex data
gfx = row["gfx_hex"]
lines = gfx.strip().split("\n")
# Each hex digit = one pixel (4-bit color index)
# Sprite 0 is at (0,0), sprite 1 at (8,0), ..., sprite 16 at (0,8)
sprite_id = 42
sx = (sprite_id % 16) * 8
sy = (sprite_id // 16) * 8
for y in range(8):
row_pixels = lines[sy + y][sx:sx + 8]
print(row_pixels) # e.g. "00770700" — each digit is a color index
```
### Extract the tile map
The map is a grid of tile indices referencing sprites from the spritesheet:
```python
import numpy as np
from PIL import Image
PICO8_PALETTE = [
(0,0,0), (29,43,83), (126,37,83), (0,135,81),
(171,82,54), (95,87,79), (194,195,199), (255,241,232),
(255,0,77), (255,163,0), (255,236,39), (0,228,54),
(41,173,255), (131,118,156), (255,119,168), (255,204,170),
]
row = ds["train"][0]
map_data = row["map_data"] # flat list of uint8 tile indices
# Reshape to 32 rows x 128 columns (some carts use 64 rows, sharing sprite memory)
width = 128
height = len(map_data) // width
tiles = np.array(map_data).reshape(height, width)
print(f"Map size: {width}x{height} tiles = {width*8}x{height*8} pixels")
print(f"Unique tiles used: {len(np.unique(tiles))}")
```
### Parse sound effects
Each SFX slot contains 32 notes. The hex format encodes pitch, waveform, volume, and effect per note:
```python
row = ds["train"][0]
sfx_lines = row["sfx_hex"].strip().split("\n")
for i, line in enumerate(sfx_lines[:4]): # First 4 SFX
# First 2 chars: editor mode, next 2: speed, next 2: loop start, next 2: loop end
header = line[:8]
notes = line[8:]
speed = int(header[2:4], 16)
print(f"SFX {i}: speed={speed}, {len(notes)//5} notes")
# Each note is 5 hex chars: pitch(2) waveform(1) volume(1) effect(1)
```
### Parse music patterns
Music patterns sequence up to 4 SFX channels together:
```python
row = ds["train"][0]
music_lines = row["music_hex"].strip().split("\n")
for i, line in enumerate(music_lines[:8]):
# Each line: 2 flag chars + 4 channel values (2 hex chars each)
flags = int(line[0:2], 16)
channels = [int(line[2+j*2:4+j*2], 16) for j in range(4)]
# Bit 0-2 of each channel byte: loop flags; bits 6+: SFX index (0-63, 65+ = off)
ch_display = [f"sfx {c & 0x3f}" if c < 65 else "---" for c in channels]
print(f"Pattern {i}: {' | '.join(ch_display)} flags={flags:08b}")
```
## Quality Distribution
<p align="center">
<img src="quality_analysis.png" alt="Quality analysis plots" width="900">
</p>
| Star Tier | Count | % |
|-----------|------:|---:|
| 0 stars | 858 | 7.8% |
| 1-4 stars | 4,049 | 36.8% |
| 5-9 stars | 2,653 | 24.1% |
| 10-49 stars | 2,868 | 26.0% |
| 50-99 stars | 206 | 1.9% |
| 100-199 stars | 246 | 2.2% |
| 200+ stars | 133 | 1.2% |
**Suggested quality tiers for downstream use:**
- **Featured** (200+ stars): 133 exceptional games — Celeste, POOM, Porklike, etc.
- **High quality** (50+ stars): 585 polished, community-recognized games
- **Solid** (10+ stars): 3,453 games that found an audience
- **Full corpus**: 10,967 games including experiments, demos, tools, and art
Use `is_duplicate=False` to exclude 53 carts that are exact code copies of higher-starred originals (mostly Celeste/Jelpi mods).
## About PICO-8
[PICO-8](https://www.lexaloffle.com/pico-8.php) is a fantasy console by [Lexaloffle](https://www.lexaloffle.com/) — a deliberately constrained environment for making tiny games:
- **Display**: 128x128 pixels, 16 fixed colors
- **Code**: Lua subset, max 8,192 tokens / 65,535 characters
- **Sprites**: 256 8x8 sprites on a 128x128 sheet
- **Map**: 128x32 tiles (or 128x64 sharing sprite memory)
- **Sound**: 64 SFX slots, 64 music patterns, 4 channels
- **Input**: 6 buttons (directional pad + O/X)
Every game's complete source code, art, sound, and music fits in a single `.p8.png` file — a 160x205 PNG with data steganographically encoded in the least significant bits.
## How This Dataset Was Built
1. **Index**: Scraped all ~368 pages of the [PICO-8 BBS Releases](https://www.lexaloffle.com/bbs/?cat=7#sub=2) category
2. **Download**: Fetched 10,999 `.p8.png` cart files (14 were unavailable)
3. **Parse**: Converted each cart to `.p8` text format using [shrinko8](https://github.com/thisismypassport/shrinko8), then extracted all sections
4. **Render**: Generated spritesheet and map PNGs using the PICO-8 palette
5. **Quality**: Computed entry point flags, duplicate detection, and star/engagement metadata
Data was collected in March 2026. The scraper respected rate limits (1.5s between requests) and identified itself via User-Agent.
## Citation
```bibtex
@dataset{pico8games2026,
title={PICO-8 Games Dataset},
author={Fraser Greenlee},
year={2026},
url={https://huggingface.co/datasets/Fraser/pico-8-games},
note={10,967 PICO-8 cartridges from the Lexaloffle BBS}
}
```
## License
71% of carts in this dataset are released under [CC4-BY-NC-SA](https://creativecommons.org/licenses/by-nc-sa/4.0/) by their authors. The remaining 29% have no explicit license specified. Filter on the `license` column for your use case. The dataset metadata and tooling are released under CC-BY-NC-SA-4.0.
提供机构:
Fraser



