codingmonster1234/chess-puzzles-rlvr
收藏Hugging Face2026-04-21 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/codingmonster1234/chess-puzzles-rlvr
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: fen
dtype: string
- name: rating
dtype: int64
- name: tags
list: string
- name: turn
dtype: string
- name: uci_moves
list: string
- name: uuid
dtype: string
splits:
- name: train
num_bytes: 867658667
num_examples: 4278346
- name: validation
num_bytes: 102119282
num_examples: 503581
- name: test
num_bytes: 51001348
num_examples: 251434
download_size: 786486627
dataset_size: 1020779297
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
license: mit
language:
- en
size_categories:
- 1M<n<10M
---
# Dataset Card: Chess-Puzzles-RLVR
## Dataset Summary
This dataset is a highly processed and stratified collection of approximately **5 million chess puzzles**, ranging from Elo ratings of **400 to 3300**. It is specifically designed for **Curriculum Learning** and **Reinforcement Learning (RLVR)** agents.
Unlike standard puzzle datasets, this version is pre-sorted and split into "rating buckets" to ensure that training, validation, and testing sets maintain an identical difficulty distribution across the entire spectrum.
---
## Dataset Structure
### Data Instances
Each instance represents a unique chess puzzle with a starting position (FEN) and the correct sequence of moves (UCI).
### Key Fields (Schema)
* **`fen`** *(string)*: The Forsyth-Edwards Notation representing the board state before the first move of the puzzle.
* **`uci_moves`** *(list of strings)*: The sequence of best moves in Universal Chess Interface (UCI) format (e.g., `["e2e4", "e7e5"]`).
* **`rating`** *(int)*: The difficulty rating of the puzzle (Elo).
* **`tags`** *(list of strings)*: Tactical motifs associated with the puzzle (e.g., `["fork", "sacrifice", "mateIn2"]`).
* **`turn`** *(string)*: Indicates which side is to move in the starting FEN (`"White"` or `"Black"`).
### Data Splits
The dataset is split into three parts, with each split containing a proportionate amount of data from every 100-point rating interval:
| Split | Percentage | Purpose |
| :--- | :--- | :--- |
| **Train** | 85% | Primary data for model training. |
| **Validation** | 10% | Monitoring performance and preventing catastrophic forgetting across difficulty tiers. |
| **Test** | 5% | Final holdout set for objective evaluation. |
---
## Creation Process
### 1. Data Cleaning and Transformation
The dataset was transformed from a raw chess puzzle format using the `python-chess` library. The following steps were taken for every row:
* **Turn Extraction**: The active color was parsed directly from the FEN string.
* **String Tokenization**: Raw space-separated strings for `moves` and `tags` were converted into clean Python lists for easier model consumption.
* **Feature Pruning**: Redundant boolean flags (e.g., `white_kingside`, `board`) were removed to reduce the dataset footprint and focus strictly on necessary state representation.
### 2. Stratified Bucketing
To facilitate curriculum learning, the dataset underwent a unique **Stratified Bucketing** process:
1. The entire dataset was sorted globally by **rating**.
2. The data was partitioned into **29 buckets**, each representing a 100-point rating range (e.g., 400-500, 501-600, ..., 3200-3300).
3. The 85/10/5 split was applied **locally within each bucket**.
4. These local splits were then re-concatenated into the final global `train`, `validation`, and `test` splits.
This ensures that whether the model is training on "easy" or "hard" data, the validation set always provides a statistically accurate reflection of the model's ability across the entire difficulty spectrum.
---
## Usage Considerations
This dataset is optimized for a **Sliding Window Sampler**. During training, it is recommended to:
1. Sample **80%** of your batch from the model's current "target" rating bucket.
2. Sample **20%** from all previously learned (easier) buckets to maintain tactical proficiency and prevent regression.
### 数据集信息
#### 特征
1. **`fen`**:数据类型为字符串,用于表示谜题第一步走子前棋盘状态的菲茨西蒙斯-爱德华兹记法(Forsyth-Edwards Notation,FEN)
2. **`rating`**:数据类型为64位整数,代表谜题的埃洛等级分难度评分
3. **`tags`**:数据类型为字符串列表,存储与谜题相关的战术主题标签
4. **`turn`**:数据类型为字符串,指示初始FEN记法中轮到行棋的一方(`"白方"`或`"黑方"`)
5. **`uci_moves`**:数据类型为字符串列表,存储采用通用国际象棋接口(Universal Chess Interface,UCI)格式的最优走子序列
#### 数据拆分
1. **训练集**:字节数696,475,259,样本量4,278,346
2. **验证集**:字节数81,978,341,样本量503,581
3. **测试集**:字节数40,931,135,样本量251,434
#### 元数据
- 下载大小:606,009,545 字节
- 数据集总大小:819,384,735 字节
#### 配置
默认配置对应的数据文件路径如下:
- 训练集:`data/train-*`
- 验证集:`data/validation-*`
- 测试集:`data/test-*`
- 许可证:MIT协议
- 语言:英语
- 规模分类:100万<样本数<1000万
---
## 数据集卡片:Chess-Puzzles-RLVR
### 数据集概述
本数据集为经过高度预处理与分层整理的国际象棋谜题集合,共包含约500万个谜题,其埃洛等级分区间为400至3300。本数据集专为课程学习(Curriculum Learning)与强化学习(Reinforcement Learning,RLVR)智能体设计。
与标准谜题数据集不同,本版本已预先排序并划分为“等级分桶”,以确保训练集、验证集与测试集在全难度区间内保持一致的难度分布。
---
### 数据集结构
#### 数据实例
每个实例对应一个唯一的国际象棋谜题,包含谜题第一步走子前的棋盘初始状态与正确走子序列。
#### 关键字段规范
| 字段名 | 数据类型 | 说明 |
| :--- | :--- | :--- |
| **`fen`** | 字符串 | 表示谜题第一步走子前棋盘状态的菲茨西蒙斯-爱德华兹记法(FEN) |
| **`uci_moves`** | 字符串列表 | 采用通用国际象棋接口(UCI)格式的最优走子序列,示例:`["e2e4", "e7e5"]` |
| **`rating`** | 整数 | 谜题的埃洛等级分难度评分 |
| **`tags`** | 字符串列表 | 与谜题相关的战术主题标签,示例:`["双攻", "弃子", "两步杀"]` |
| **`turn`** | 字符串 | 指示初始FEN记法中轮到走子的一方,可选值为`"白方"`或`"黑方"` |
#### 数据拆分
本数据集划分为三部分,每个拆分均包含来自每100分等级分区间的成比例数据:
| 拆分名称 | 占比 | 用途 |
| :--- | :--- | :--- |
| **训练集** | 85% | 用于模型训练的核心数据 |
| **验证集** | 10% | 用于监控模型性能,防止在不同难度层级上出现灾难性遗忘 |
| **测试集** | 5% | 用于客观评估的最终留存数据集 |
---
### 构建流程
#### 1. 数据清洗与转换
本数据集基于原始国际象棋谜题格式,通过`python-chess`库转换得到。针对每一行数据均执行了以下步骤:
- **走方提取**:直接从FEN字符串中解析出当前行棋方
- **字符串分词**:将原始以空格分隔的`moves`与`tags`字符串转换为规范的Python列表,便于模型消费
- **特征裁剪**:移除冗余的布尔标记(例如`white_kingside`、`board`),以减小数据集体积,并仅保留必要的状态表征信息
#### 2. 分层分桶
为支持课程学习,本数据集采用了独特的**分层分桶**流程:
1. 对全量数据集按等级分进行全局排序
2. 将数据划分为29个分桶,每个分桶对应100分的等级分区间(例如400-500、501-600……3200-3300)
3. 在每个分桶内部执行85/10/5的拆分比例
4. 将各分桶内的本地拆分结果重新拼接,得到最终的全局训练集、验证集与测试集拆分
此流程可确保无论模型在“简单”还是“困难”数据上训练,验证集始终能从统计层面准确反映模型在全难度区间内的性能表现。
---
### 使用注意事项
本数据集针对**滑动窗口采样器**进行了优化。在训练过程中,建议遵循以下策略:
1. 从模型当前的“目标”等级分桶中采样80%的批次数据
2. 从所有已学习过的(更简单的)分桶中采样20%的批次数据,以保持战术熟练度并防止性能退化
提供机构:
codingmonster1234



