standard-chess-games
收藏魔搭社区2025-12-25 更新2024-12-21 收录
下载链接:
https://modelscope.cn/datasets/AI-ModelScope/standard-chess-games
下载链接
链接失效反馈官方服务:
资源简介:
> [!CAUTION]
> This dataset is still a work in progress and some breaking changes might occur.
>
# Lichess Rated Standard Chess Games Dataset
## Dataset Description
**6,771,826,271** standard rated games, played on [lichess.org](https://lichess.org), updated monthly from the [database dumps](https://database.lichess.org/#standard_games).
This version of the data is meant for data analysis. If you need PGN files you can find those [here](https://database.lichess.org/#standard_games). That said, once you have a subset of interest, it is trivial to convert it back to PGN as shown in the [Dataset Usage](#dataset-usage) section.
This dataset is hive-partitioned into multiple parquet files on two keys: `year` and `month`:
```bash
.
├── data
│ └── year=2015
│ ├── month=01
│ │ ├── train-00000-of-00003.parquet
│ │ ├── train-00001-of-00003.parquet
│ │ └── train-00002-of-00003.parquet
│ ├── month=02
│ │ ├── train-00000-of-00003.parquet
│ │ ├── train-00001-of-00003.parquet
│ │ └── train-00002-of-00003.parquet
│ ├── ...
```
### Dataset Usage
<!-- Using the `datasets` library:
```python
from datasets import load_dataset
dset = load_dataset("Lichess/chess-evaluations", split="train")
```
Using the `polars` library:
Using DuckDB:
Using `python-chess`: -->
## Dataset Details
### Dataset Sample
<!-- One row of the dataset looks like this:
```python
{
"Event":,
"Site":,
}
``` -->
### Dataset Fields
<!-- Every row of the dataset contains the following fields:
- **`Event`**: `string`,
- **`Site`**: `string`, -->
### Notes
- About 6% of the games include Stockfish analysis evaluations: [%eval 2.35] (235 centipawn advantage), [%eval #-4] (getting mated in 4), always from White's point of view.
- The WhiteElo and BlackElo tags contain Glicko2 ratings.
- The `movetext` column contains clock information as PGN %clk comments since April 2017.
- The schema doesn't include the `Date` header, typically part of the [Seven Tag Roster](https://en.wikipedia.org/wiki/Portable_Game_Notation#Seven_Tag_Roster) as we deemed the `UTCDate` field to be enough.
- A future version of the data will include the addition of a `UCI` column containing the corresponding moves in [UCI format](https://en.wikipedia.org/wiki/Universal_Chess_Interface).
> ⚠️ 【警告】本数据集仍处于开发阶段,可能会出现重大变更。
# Lichess 评级标准国际象棋对局数据集
## 数据集说明
**6771826271** 场在[lichess.org](https://lichess.org)平台进行的标准评级对局,每月从[数据库导出包](https://database.lichess.org/#standard_games)更新一次。
本版本数据集专为数据分析设计。若需PGN(Portable Game Notation,可移植对局格式)文件,可前往[此处](https://database.lichess.org/#standard_games)获取。此外,一旦获取到目标子集,即可轻松将其转换回PGN格式,具体方法可参考[数据集使用方法](#dataset-usage)章节。
本数据集采用Hive分区方式,以`year`(年份)和`month`(月份)作为两个分区键,拆分为多个Parquet列式存储文件:
bash
.
├── data
│ └── year=2015
│ ├── month=01
│ │ ├── train-00000-of-00003.parquet
│ │ ├── train-00001-of-00003.parquet
│ │ └── train-00002-of-00003.parquet
│ ├── month=02
│ │ ├── train-00000-of-00003.parquet
│ │ ├── train-00001-of-00003.parquet
│ │ └── train-00002-of-00003.parquet
│ ├── ...
### 数据集使用方法
<!-- 使用`datasets`库:
python
from datasets import load_dataset
dset = load_dataset("Lichess/chess-evaluations", split="train")
使用`polars`库:
使用DuckDB:
使用`python-chess`库: -->
## 数据集详情
### 数据集样例
<!-- 数据集的一行示例如下:
python
{
"Event":,
"Site":,
}
-->
### 数据集字段
<!-- 数据集的每一行均包含以下字段:
- **`Event`**: 字符串类型,
- **`Site`**: 字符串类型, -->
### 备注
- 约6%的对局包含Stockfish国际象棋引擎的分析评估结果:`[%eval 2.35]`表示白方拥有235厘pawn的优势,`[%eval #-4]`表示白方即将在4步内被将死,所有评估均以白方视角给出。
- `WhiteElo`与`BlackElo`字段存储的是Glicko2评级系统给出的评级分。
- 自2017年4月起,`movetext`列会包含以PGN格式的`%clk`注释形式存储的时钟信息。
- 本数据集的Schema未包含`Date`字段——该字段通常属于[七个标准标签列表](https://en.wikipedia.org/wiki/Portable_Game_Notation#Seven_Tag_Roster)的一部分,我们认为`UTCDate`字段已足够满足需求。
- 未来版本的数据集将新增`UCI`列,以[UCI(Universal Chess Interface,通用国际象棋接口)格式](https://en.wikipedia.org/wiki/Universal_Chess_Interface)存储对应的走法。
提供机构:
maas
创建时间:
2024-12-13



