InterwebAlchemy/pgn-lichess-puzzle-dataset
收藏Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/InterwebAlchemy/pgn-lichess-puzzle-dataset
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc0-1.0
language:
- en
tags:
- chess
- pgn
- puzzles
- tactics
pretty_name: Chess Puzzles with PGN Context
size_categories:
- 1K<n<10K
---
# Chess Puzzles with PGN Context
Tactical chess puzzles drawn from the [Lichess Open Puzzle Database](https://database.lichess.org/#puzzles),
augmented with full PGN game context reconstructed from the source Lichess games.
Each record presents a middlegame position as a PGN move sequence — the same format used
to train PGN language models like [kn1ght](https://github.com/InterwebAlchemy/kn1ght) — together
with the engine-validated best move as the label.
## Dataset Summary
- **5,000 puzzles** with reconstructed PGN context
- Rating range: 1200–1900 (mean: 1540)
- Themes: `middlegame`, `short`, `advantage`, `crushing`, `long`, `mate`, `fork`, `kingsideAttack` (middlegame only; opening/endgame excluded)
- Splits: 80% train / 10% validation / 10% test
## Schema
| Column | Type | Description |
|---|---|---|
| `puzzle_id` | string | Lichess puzzle ID |
| `game_id` | string | Lichess game ID (source of PGN context) |
| `rating` | int32 | Puzzle difficulty (Lichess Glicko-2 rating) |
| `themes` | list[string] | Tactical theme tags (e.g. `fork`, `pin`, `skewer`) |
| `pgn_context` | string | PGN move text up to (not including) the puzzle move |
| `fen` | string | Board position at start of puzzle in FEN notation |
| `best_move_uci` | string | Correct first move in UCI notation |
| `best_move_san` | string | Correct first move in SAN notation |
## Usage
```python
from datasets import load_dataset
ds = load_dataset("InterwebAlchemy/chess-puzzles-pgn")
# Each example:
# {'pgn_context': '1.e4 e5 2.Nf3 Nc6 ... 18.Rxd4',
# 'best_move_san': 'Nf6+',
# 'rating': 1487,
# 'themes': ['fork', 'middlegame']}
```
## How PGN context is reconstructed
For each Lichess puzzle:
1. The source game PGN is fetched from `lichess.org/api/game/<id>`
2. The game is replayed move by move until the board FEN matches the puzzle FEN
3. The move-text up to that point is stored as `pgn_context`
4. The first solution move (UCI → SAN) is stored as `best_move_san`
Puzzles where the FEN could not be located in the source game are discarded.
## Intended use
Evaluating and fine-tuning PGN language models on tactical positions. The `pgn_context`
field can be fed directly to any model that generates chess moves as PGN continuations.
## Licensing
This dataset is derived from the [Lichess Open Database](https://database.lichess.org/),
which is released under [CC0 1.0 (Public Domain)](https://creativecommons.org/publicdomain/zero/1.0/).
This derived dataset is also released under CC0 1.0.
提供机构:
InterwebAlchemy



