Lichess/chess-puzzles-with-games

Name: Lichess/chess-puzzles-with-games
Creator: Lichess
Published: 2026-03-28 17:52:35
License: 暂无描述

Hugging Face2026-03-28 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/Lichess/chess-puzzles-with-games

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default data_files: - split: train path: data/train-* license: cc0-1.0 tags: - chess - lichess - puzzles pretty_name: Lichess Puzzles with Game Information size_categories: - 1M<n<10M dataset_info: features: - name: Popularity dtype: int8 - name: FEN dtype: string - name: RatingDeviation dtype: uint16 - name: Themes list: string - name: NbPlays dtype: uint32 - name: Moves dtype: string - name: Rating dtype: uint16 - name: PuzzleId dtype: string - name: perf dtype: string - name: tournament dtype: string - name: swiss dtype: string - name: lastMoveAt dtype: timestamp[ms] - name: clock struct: - name: increment dtype: int64 - name: initial dtype: int64 - name: totalTime dtype: int64 - name: variant dtype: string - name: speed dtype: string - name: createdAt dtype: timestamp[ms] - name: analysis list: json - name: moves dtype: string - name: status dtype: string - name: GameId dtype: string - name: ECO dtype: string - name: Opening dtype: string - name: OpeningPly dtype: int64 - name: White dtype: string - name: WhiteElo dtype: int64 - name: WhiteRatingDiff dtype: int16 - name: WhiteProvisional dtype: bool - name: WhiteAcpl dtype: uint16 - name: WhiteAccuracy dtype: int64 - name: WhiteMistake dtype: uint8 - name: WhiteInaccuracy dtype: uint8 - name: WhiteBlunder dtype: uint8 - name: Black dtype: string - name: BlackElo dtype: int64 - name: BlackRatingDiff dtype: int16 - name: BlackProvisional dtype: bool - name: BlackAcpl dtype: uint16 - name: BlackAccuracy dtype: int64 - name: BlackMistake dtype: uint8 - name: BlackInaccuracy dtype: uint8 - name: BlackBlunder dtype: uint8 - name: WhiteTitle dtype: string - name: BlackTitle dtype: string - name: Result dtype: string - name: movetext dtype: string splits: - name: train num_bytes: 27910847742 num_examples: 2969886 download_size: 18401113171 dataset_size: 27910847742 --- > [!CAUTION] > This dataset is still a work in progress. Expect breaking changes. > This dataset was contributed by [Marco Cognetta](https://github.com/mcognetta). The original archived repository describing the project can be found here: https://github.com/mcognetta/lichess-combined-puzzle-game-db ## Background > This contains every puzzle from the [Lichess Puzzle Database](https://database.lichess.org/#puzzles) joined with their games from the [Lichess Game Database](https://database.lichess.org/#standard_games). The puzzle data was pulled in September 2022. There are 2,969,948 puzzles in total. The game info was pulled from the [Lichess API](https://lichess.org/api) with [every flag enabled](https://lichess.org/api#tag/Games/operation/gamesExportIds). ## Format >The database is given as a single `bzip2` compressed [ndjson](http://ndjson.org/) file. Each line contains a JSON object with two top-level fields: `game` and `puzzle`. The `game` object contains the entire JSON dump of the game information from the [Lichess API call](https://lichess.org/api#tag/Games/operation/gamesExportIds) with every flag enabled. The `puzzle` object contains all of the information from the puzzle database entry, with the field names being taken from the [csv headers](https://database.lichess.org/#puzzles). That is, you can expect the following fields in the `puzzle` object (though they are not necessarily all populated): `PuzzleId`,`FEN`,`Moves`,`Rating`,`RatingDeviation`,`Popularity`,`NbPlays`,`Themes`,`GameUrl`,`OpeningFamily`, and `OpeningVariation`. > >The games and puzzles are joined by the game id (`id` in the `game` object). The matching `id` is extracted from a puzzle's `GameUrl` field. Note that `PuzzleId` and `id` are unrelated. >

提供机构：

Lichess

搜集汇总

数据集介绍

构建方式

在棋类人工智能研究领域，数据集的构建往往依赖于对海量实战棋谱的深度挖掘与结构化处理。本数据集通过整合Lichess平台的两大核心数据源——谜题数据库与对局数据库，实现了信息的有机融合。具体而言，研究者于2022年9月提取了Lichess谜题数据库中的全部谜题数据，随后通过Lichess API，以启用全部标志的方式，获取了与每个谜题相关联的完整对局信息。最终，通过游戏ID这一关键字段，将谜题对象与对应的对局对象精确连接，形成一个统一的ndjson格式文件，并采用bzip2压缩以方便存储与传输。

使用方法

该数据集主要服务于国际象棋人工智能、棋艺教学分析与计算博弈论等领域的研究与应用。使用者可通过解析ndjson格式的每一行数据，同时访问结构化的`puzzle`对象与`game`对象。在机器学习任务中，研究者可利用谜题数据作为训练样本或评估基准，例如构建战术求解模型；同时，结合完整的对局上下文信息，可以深入分析特定战术模式在实战中出现的时机、频率及其对胜负的影响。数据集中详尽的对局元数据与棋局分析指标，也使得其可用于棋手风格建模、开局库构建以及棋局质量自动评估等复杂任务。

背景与挑战

背景概述

国际象棋人工智能研究长期依赖于高质量的对弈数据与战术谜题，以推动棋局分析与决策算法的进步。chess-puzzles-with-games数据集由研究员Marco Cognetta于2022年构建，整合了Lichess平台的海量谜题库与对应完整对局信息。该数据集的核心研究问题在于通过关联谜题与其原始对局上下文，为机器学习模型提供丰富的棋局状态、战术主题及玩家行为数据，从而深化对国际象棋复杂决策过程的理解。其大规模、结构化的特性显著促进了棋类AI在战术识别、局面评估及开局研究等领域的实证分析，成为该领域重要的基准资源之一。

当前挑战

该数据集旨在解决国际象棋战术识别与决策建模中的关键挑战，即如何从海量对局中精准提取具有教学意义的战术片段，并关联其全局上下文以理解战术产生的动态条件。在构建过程中，主要挑战包括：从异构的Lichess API与谜题数据库中高效匹配数百万条记录，确保游戏ID与谜题URL的准确对应；处理数据的时间同步性与完整性，避免因游戏信息缺失或格式不一致导致的信息损失；以及管理超大规模数据（近300万条记录）的存储与压缩，在保持数据可访问性的同时优化处理效率。这些挑战对数据集的可靠性与实用性提出了严格要求。

常用场景

经典使用场景

在国际象棋人工智能研究领域，数据集chess-puzzles-with-games为棋局分析与策略生成提供了丰富资源。该数据集整合了Lichess平台的海量谜题与对应完整对局信息，使得研究者能够深入探究特定棋局情境下的最优解法和战术模式。经典使用场景包括训练强化学习代理，通过解析数百万个标注了评级、主题和走法的谜题，模型能够学习复杂决策过程，模拟人类棋手的思维路径，从而提升在动态环境中的策略规划能力。

解决学术问题

该数据集有效解决了国际象棋计算智能中的若干核心学术问题，例如棋局评估函数的优化与战术模式的自动识别。通过提供详尽的棋局特征，如局面评分、开局分类及玩家失误统计，研究者能够构建更精确的预测模型，以分析棋手行为偏差和策略演化。其意义在于推动了人工智能在博弈论领域的进展，为开发通用决策系统提供了实证基础，同时促进了跨学科研究，将棋类游戏转化为理解智能决策的试验场。

实际应用

在实际应用层面，chess-puzzles-with-games数据集被广泛用于构建教育工具和训练平台，辅助棋手提升战术水平。基于该数据集开发的应用程序能够提供个性化谜题推荐，分析用户对局中的弱点，并模拟实战场景进行针对性训练。此外，游戏开发者和赛事组织者利用这些数据优化AI对手的难度设置，增强用户体验，同时为国际象棋社区提供数据驱动的洞察，促进棋艺的普及与竞技水平的整体提升。

数据集最近研究