five

Chess-Nut-Engine/chess-sft-data

收藏
Hugging Face2026-03-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Chess-Nut-Engine/chess-sft-data
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - en size_categories: - 1M<n<10M tags: - chess - sft - instruction-tuning - reasoning - chess960 pretty_name: Chess SFT Training Data configs: - config_name: default data_files: - split: train path: "tier*/*.jsonl" - config_name: tier1_perception data_files: - split: train path: "tier1/*.jsonl" - config_name: tier2_rules data_files: - split: train path: "tier2/*.jsonl" - config_name: tier3_tactics data_files: - split: train path: "tier3/*.jsonl" - config_name: tier4_evaluation data_files: - split: train path: "tier4/*.jsonl" - config_name: tier5_openings data_files: - split: train path: "tier5/*.jsonl" - config_name: tier6_endgames data_files: - split: train path: "tier6/*.jsonl" - config_name: tier7_planning data_files: - split: train path: "tier7/*.jsonl" --- # Chess SFT Training Data A supervised fine-tuning dataset for teaching language models to reason about chess. It covers **28 tasks** across **7 tiers** of increasing difficulty, from basic board perception through tactical analysis to endgame play and strategic planning. Every example uses standard chess conventions: positions are encoded in [FEN](https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notation), moves in [UCI notation](https://en.wikipedia.org/wiki/Universal_Chess_Interface) (e.g. `e2e4`, `g1f3`, `a7a8q` for promotion), and board diagrams use a consistent rank-file layout. Approximately 10-20% of examples per tier use [Chess960](https://en.wikipedia.org/wiki/Fischer_random_chess) starting positions. | | | |---|---| | **Total examples** | 1,559,583 | | **Total size** | 1055.9 MB | | **Format** | JSONL (chat-style `messages` with system / user / assistant turns) | | **Eval companion** | [`Chess-Nut-Engine/chess-sft-eval`](https://huggingface.co/datasets/Chess-Nut-Engine/chess-sft-eval) | ## Tier Overview | Tier | Category | Tasks | Examples | Size | |------|----------|-------|----------|------| | 1 | Perception | 5 | 440,000 | 302 MB | | 2 | Rules | 5 | 370,000 | 240 MB | | 3 | Tactics | 5 | 260,000 | 167 MB | | 4 | Evaluation | 3 | 150,000 | 102 MB | | 5 | Openings | 3 | 29,583 | 22 MB | | 6 | Endgames | 4 | 140,000 | 88 MB | | 7 | Planning | 3 | 170,000 | 135 MB | ## Example Each line is a JSON object with the following schema: ```json { "task": "1.1_fen_to_board", "tier": 1, "fen": "r4rk1/p1q3pp/2p2b2/2P2p2/PB1P4/5N2/4Q1PP/R3R1K1 w - - 1 22", "is_chess960": false, "messages": [ { "role": "system", "content": "You are a chess reasoning engine. You understand chess positions in FEN notation and express all moves in UCI notation (e.g., e2e4, g1f3, a7a8q for promotion). When analyzing positions, think step by step." }, { "role": "user", "content": "Position (FEN): r4rk1/p1q3pp/2p2b2/2P2p2/PB1P4/5N2/4Q1PP/R3R1K1 w - - 1 22\nShow me the board." }, { "role": "assistant", "content": "8 r . . . . r k .\n7 p . q . . . p p\n6 . . p . . b . .\n5 . . P . . p . .\n4 P B . P . . . .\n3 . . . . . N . .\n2 . . . . Q . P P\n1 R . . . R . K .\n a b c d e f g h" } ], "metadata": {} } ``` ### Key fields | Field | Description | |-------|-------------| | `task` | Task identifier (e.g. `3.2_threats`) — the first digit is the tier number | | `tier` | Tier number (1-7) | | `fen` | The chess position in FEN notation | | `is_chess960` | Whether this example uses a Chess960 starting position | | `messages` | Chat-format conversation (system + user + assistant) ready for SFT | | `metadata` | Optional task-specific metadata (e.g. puzzle rating, eval depth) | ## Loading ```python from datasets import load_dataset # Load everything (default config) ds = load_dataset("Chess-Nut-Engine/chess-sft-data") # Load a single tier ds = load_dataset("Chess-Nut-Engine/chess-sft-data", "tier3_tactics") # Load with streaming (recommended for full dataset) ds = load_dataset("Chess-Nut-Engine/chess-sft-data", streaming=True) ``` ## Data Sources Training positions are sourced from: - **[Lichess Standard Chess Games](https://huggingface.co/datasets/Lichess/standard-chess-games)** — real game positions (min Elo 1200) - **[Lichess Chess Puzzles](https://huggingface.co/datasets/Lichess/chess-puzzles)** — tactical puzzles with known solutions - **[Lichess Chess Openings](https://huggingface.co/datasets/Lichess/chess-openings)** — ECO-classified opening positions - **[Lichess Position Evaluations](https://huggingface.co/datasets/Lichess/chess-position-evaluations)** — Stockfish evaluations (depth 20-40+) - **[MATE Dataset](https://huggingface.co/datasets/OutFlankShu/MATE_DATASET)** — checkmate pattern positions - **Syzygy Endgame Tablebases** — perfect endgame play (up to 6 pieces) - **Polyglot Opening Books** — book moves for opening continuations - **Chess960 Random Positions** — generated Fischer Random starting positions All eval/benchmark FENs are excluded from training via a blocklist to prevent contamination. ## Detailed File Listing ### Tier 1 — Perception | File | Task | Examples | Size | |------|------|----------|------| | `tier1/1.1_fen_to_board.jsonl` | Render a FEN string as a human-readable board diagram | 80,000 | 56.1 MB | | `tier1/1.2_board_to_fen.jsonl` | Convert a board diagram back to FEN notation | 80,000 | 56.6 MB | | `tier1/1.3_piece_identification.jsonl` | Identify which piece occupies a given square | 100,000 | 58.6 MB | | `tier1/1.4_piece_counting.jsonl` | Count pieces of a specific type/color on the board | 80,000 | 55.0 MB | | `tier1/1.5_state_tracking.jsonl` | Extract castling rights, en passant, side to move from FEN | 100,000 | 76.2 MB | ### Tier 2 — Rules | File | Task | Examples | Size | |------|------|----------|------| | `tier2/2.1_legal_move_gen.jsonl` | List all legal moves for the side to move | 100,000 | 72.4 MB | | `tier2/2.2_piece_specific_moves.jsonl` | List legal moves for a specific piece on a given square | 80,000 | 49.0 MB | | `tier2/2.3_move_legality_check.jsonl` | Determine whether a given move is legal | 80,000 | 49.3 MB | | `tier2/2.4_check_detection.jsonl` | Detect if a king is in check, checkmate, or stalemate | 60,000 | 37.4 MB | | `tier2/2.5_special_rules.jsonl` | Handle castling, en passant, promotion, and 50-move rule | 50,000 | 31.6 MB | ### Tier 3 — Tactics | File | Task | Examples | Size | |------|------|----------|------| | `tier3/3.1_available_captures.jsonl` | Find all available capture moves | 60,000 | 35.1 MB | | `tier3/3.2_threats.jsonl` | Identify pieces that are threatening enemy pieces | 50,000 | 30.7 MB | | `tier3/3.3_attacked_defended.jsonl` | Determine which squares are attacked or defended | 60,000 | 38.6 MB | | `tier3/3.4_tactical_patterns.jsonl` | Recognize forks, pins, skewers, and discovered attacks | 50,000 | 37.3 MB | | `tier3/3.5_hanging_pieces.jsonl` | Find undefended pieces that can be captured for free | 40,000 | 25.2 MB | ### Tier 4 — Evaluation | File | Task | Examples | Size | |------|------|----------|------| | `tier4/4.1_material_balance.jsonl` | Count material and compute the balance in centipawns | 50,000 | 35.9 MB | | `tier4/4.2_position_evaluation.jsonl` | Evaluate a position using Stockfish-calibrated assessments | 60,000 | 40.9 MB | | `tier4/4.3_pawn_structure.jsonl` | Analyze doubled, isolated, passed, and backward pawns | 40,000 | 25.5 MB | ### Tier 5 — Openings | File | Task | Examples | Size | |------|------|----------|------| | `tier5/5.1_opening_identification.jsonl` | Name the opening from a position or move sequence | 10,000 | 7.2 MB | | `tier5/5.2_opening_continuation.jsonl` | Suggest the next book move in a known opening line | 9,583 | 7.1 MB | | `tier5/5.3_opening_principles.jsonl` | Explain opening principles relevant to the position | 10,000 | 8.0 MB | ### Tier 6 — Endgames | File | Task | Examples | Size | |------|------|----------|------| | `tier6/6.1_endgame_classification.jsonl` | Classify the type of endgame (e.g., KRK, KPK) | 30,000 | 18.1 MB | | `tier6/6.2_endgame_wdl.jsonl` | Predict Win/Draw/Loss using Syzygy tablebase probing | 40,000 | 24.4 MB | | `tier6/6.3_endgame_best_move.jsonl` | Find the best endgame move via DTZ tablebase lookup | 40,000 | 24.3 MB | | `tier6/6.4_endgame_principles.jsonl` | Explain relevant endgame principles for the position | 30,000 | 20.9 MB | ### Tier 7 — Planning | File | Task | Examples | Size | |------|------|----------|------| | `tier7/7.1_best_move_selection.jsonl` | Select the best move from a Stockfish-evaluated position | 80,000 | 59.7 MB | | `tier7/7.2_puzzle_solving.jsonl` | Solve a tactical puzzle (from Lichess puzzles database) | 50,000 | 41.4 MB | | `tier7/7.3_move_consequence.jsonl` | Predict the evaluation change after a candidate move | 40,000 | 33.5 MB | ## License This dataset is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).
提供机构:
Chess-Nut-Engine
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作