codingmonster1234/chess-puzzles-rlvr

Name: codingmonster1234/chess-puzzles-rlvr
Creator: codingmonster1234
Published: 2026-04-21 06:39:26
License: 暂无描述

Hugging Face2026-04-21 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/codingmonster1234/chess-puzzles-rlvr

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: fen dtype: string - name: rating dtype: int64 - name: tags list: string - name: turn dtype: string - name: uci_moves list: string - name: uuid dtype: string splits: - name: train num_bytes: 867658667 num_examples: 4278346 - name: validation num_bytes: 102119282 num_examples: 503581 - name: test num_bytes: 51001348 num_examples: 251434 download_size: 786486627 dataset_size: 1020779297 configs: - config_name: default data_files: - split: train path: data/train-* - split: validation path: data/validation-* - split: test path: data/test-* license: mit language: - en size_categories: - 1M<n<10M --- # Dataset Card: Chess-Puzzles-RLVR ## Dataset Summary This dataset is a highly processed and stratified collection of approximately **5 million chess puzzles**, ranging from Elo ratings of **400 to 3300**. It is specifically designed for **Curriculum Learning** and **Reinforcement Learning (RLVR)** agents. Unlike standard puzzle datasets, this version is pre-sorted and split into "rating buckets" to ensure that training, validation, and testing sets maintain an identical difficulty distribution across the entire spectrum. --- ## Dataset Structure ### Data Instances Each instance represents a unique chess puzzle with a starting position (FEN) and the correct sequence of moves (UCI). ### Key Fields (Schema) * **`fen`** *(string)*: The Forsyth-Edwards Notation representing the board state before the first move of the puzzle. * **`uci_moves`** *(list of strings)*: The sequence of best moves in Universal Chess Interface (UCI) format (e.g., `["e2e4", "e7e5"]`). * **`rating`** *(int)*: The difficulty rating of the puzzle (Elo). * **`tags`** *(list of strings)*: Tactical motifs associated with the puzzle (e.g., `["fork", "sacrifice", "mateIn2"]`). * **`turn`** *(string)*: Indicates which side is to move in the starting FEN (`"White"` or `"Black"`). ### Data Splits The dataset is split into three parts, with each split containing a proportionate amount of data from every 100-point rating interval: | Split | Percentage | Purpose | | :--- | :--- | :--- | | **Train** | 85% | Primary data for model training. | | **Validation** | 10% | Monitoring performance and preventing catastrophic forgetting across difficulty tiers. | | **Test** | 5% | Final holdout set for objective evaluation. | --- ## Creation Process ### 1. Data Cleaning and Transformation The dataset was transformed from a raw chess puzzle format using the `python-chess` library. The following steps were taken for every row: * **Turn Extraction**: The active color was parsed directly from the FEN string. * **String Tokenization**: Raw space-separated strings for `moves` and `tags` were converted into clean Python lists for easier model consumption. * **Feature Pruning**: Redundant boolean flags (e.g., `white_kingside`, `board`) were removed to reduce the dataset footprint and focus strictly on necessary state representation. ### 2. Stratified Bucketing To facilitate curriculum learning, the dataset underwent a unique **Stratified Bucketing** process: 1. The entire dataset was sorted globally by **rating**. 2. The data was partitioned into **29 buckets**, each representing a 100-point rating range (e.g., 400-500, 501-600, ..., 3200-3300). 3. The 85/10/5 split was applied **locally within each bucket**. 4. These local splits were then re-concatenated into the final global `train`, `validation`, and `test` splits. This ensures that whether the model is training on "easy" or "hard" data, the validation set always provides a statistically accurate reflection of the model's ability across the entire difficulty spectrum. --- ## Usage Considerations This dataset is optimized for a **Sliding Window Sampler**. During training, it is recommended to: 1. Sample **80%** of your batch from the model's current "target" rating bucket. 2. Sample **20%** from all previously learned (easier) buckets to maintain tactical proficiency and prevent regression.

### 数据集信息 #### 特征 1. **`fen`**：数据类型为字符串，用于表示谜题第一步走子前棋盘状态的菲茨西蒙斯-爱德华兹记法（Forsyth-Edwards Notation，FEN） 2. **`rating`**：数据类型为64位整数，代表谜题的埃洛等级分难度评分 3. **`tags`**：数据类型为字符串列表，存储与谜题相关的战术主题标签 4. **`turn`**：数据类型为字符串，指示初始FEN记法中轮到行棋的一方（`"白方"`或`"黑方"`） 5. **`uci_moves`**：数据类型为字符串列表，存储采用通用国际象棋接口（Universal Chess Interface，UCI）格式的最优走子序列 #### 数据拆分 1. **训练集**：字节数696,475,259，样本量4,278,346 2. **验证集**：字节数81,978,341，样本量503,581 3. **测试集**：字节数40,931,135，样本量251,434 #### 元数据 - 下载大小：606,009,545 字节 - 数据集总大小：819,384,735 字节 #### 配置默认配置对应的数据文件路径如下： - 训练集：`data/train-*` - 验证集：`data/validation-*` - 测试集：`data/test-*` - 许可证：MIT协议 - 语言：英语 - 规模分类：100万<样本数<1000万 --- ## 数据集卡片：Chess-Puzzles-RLVR ### 数据集概述本数据集为经过高度预处理与分层整理的国际象棋谜题集合，共包含约500万个谜题，其埃洛等级分区间为400至3300。本数据集专为课程学习（Curriculum Learning）与强化学习（Reinforcement Learning，RLVR）智能体设计。与标准谜题数据集不同，本版本已预先排序并划分为“等级分桶”，以确保训练集、验证集与测试集在全难度区间内保持一致的难度分布。 --- ### 数据集结构 #### 数据实例每个实例对应一个唯一的国际象棋谜题，包含谜题第一步走子前的棋盘初始状态与正确走子序列。 #### 关键字段规范 | 字段名 | 数据类型 | 说明 | | :--- | :--- | :--- | | **`fen`** | 字符串 | 表示谜题第一步走子前棋盘状态的菲茨西蒙斯-爱德华兹记法（FEN） | | **`uci_moves`** | 字符串列表 | 采用通用国际象棋接口（UCI）格式的最优走子序列，示例：`["e2e4", "e7e5"]` | | **`rating`** | 整数 | 谜题的埃洛等级分难度评分 | | **`tags`** | 字符串列表 | 与谜题相关的战术主题标签，示例：`["双攻", "弃子", "两步杀"]` | | **`turn`** | 字符串 | 指示初始FEN记法中轮到走子的一方，可选值为`"白方"`或`"黑方"` | #### 数据拆分本数据集划分为三部分，每个拆分均包含来自每100分等级分区间的成比例数据： | 拆分名称 | 占比 | 用途 | | :--- | :--- | :--- | | **训练集** | 85% | 用于模型训练的核心数据 | | **验证集** | 10% | 用于监控模型性能，防止在不同难度层级上出现灾难性遗忘 | | **测试集** | 5% | 用于客观评估的最终留存数据集 | --- ### 构建流程 #### 1. 数据清洗与转换本数据集基于原始国际象棋谜题格式，通过`python-chess`库转换得到。针对每一行数据均执行了以下步骤： - **走方提取**：直接从FEN字符串中解析出当前行棋方 - **字符串分词**：将原始以空格分隔的`moves`与`tags`字符串转换为规范的Python列表，便于模型消费 - **特征裁剪**：移除冗余的布尔标记（例如`white_kingside`、`board`），以减小数据集体积，并仅保留必要的状态表征信息 #### 2. 分层分桶为支持课程学习，本数据集采用了独特的**分层分桶**流程： 1. 对全量数据集按等级分进行全局排序 2. 将数据划分为29个分桶，每个分桶对应100分的等级分区间（例如400-500、501-600……3200-3300） 3. 在每个分桶内部执行85/10/5的拆分比例 4. 将各分桶内的本地拆分结果重新拼接，得到最终的全局训练集、验证集与测试集拆分此流程可确保无论模型在“简单”还是“困难”数据上训练，验证集始终能从统计层面准确反映模型在全难度区间内的性能表现。 --- ### 使用注意事项本数据集针对**滑动窗口采样器**进行了优化。在训练过程中，建议遵循以下策略： 1. 从模型当前的“目标”等级分桶中采样80%的批次数据 2. 从所有已学习过的（更简单的）分桶中采样20%的批次数据，以保持战术熟练度并防止性能退化

提供机构：

codingmonster1234

5,000+

优质数据集

54 个

任务类型

进入经典数据集