TannerGladson/chess-roberta-pretraining
收藏Hugging Face2024-05-12 更新2024-06-26 收录
下载链接:
https://hf-mirror.com/datasets/TannerGladson/chess-roberta-pretraining
下载链接
链接失效反馈官方服务:
资源简介:
---
# For reference on dataset card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/datasets-cards
{}
---
# Dataset Card for Dataset Name
A collection of move sequences from chess games found at https://database.lichess.org/
## Dataset Details
A collection of chess board states and the associated move sequences. PGN files have been downloaded from https://database.lichess.org/. Each game has been parsed into multiple records, where each record begins with an FEN and is followed by 1 to 10 SANs.
## Uses
Will be used to train ChessRoberta. This should likely not be used to create a high performing chess model. These games have not been filtered.
## Dataset Structure
Each line is a record, and each record has the following fields:
* text: (Str) the aforementioned string where an FEN and several SANs have been concatentated
* pgn_start: (Int) the index of the first character of the first SAN (in the text field's string)
* num_sans: (Int) the number of half-moves contained in the text field's string
* num_prior_moves (Int) the number of half-moves which elapsed before this record's FEN (remark: black and white each moving once is two elapsed moves)
* game_id (Str) the lichess identifier where the game was taken from
Within the text field, special tokens are used as seperators
```
SPECIAL_TOKENS = {
"PGN_START": "~",
"MOVE_SEP": ">",
}
```
### Example Record
```
{
text: "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1~e4>e6>d3>d5>Nd2"
pgn_start: 57
num_sans: 5
num_prior_moves: 0
game_id: "https://lichess.org/PwE2cWn3"
}
```
提供机构:
TannerGladson
原始信息汇总
数据集卡片 for Dataset Name
数据集详情
这是一个国际象棋游戏移动序列的集合,数据来源于 https://database.lichess.org/。数据集包含棋盘状态和相应的移动序列。PGN 文件从 https://database.lichess.org/ 下载,每个游戏被解析成多个记录,每个记录以一个 FEN 开始,后面跟着 1 到 10 个 SAN。
用途
该数据集将用于训练 ChessRoberta。不建议使用此数据集创建高性能的国际象棋模型,因为这些游戏未经筛选。
数据集结构
每行是一个记录,每个记录包含以下字段:
text: (Str) 一个字符串,包含 FEN 和多个 SAN 的连接pgn_start: (Int) 第一个 SAN 在text字段字符串中的起始字符索引num_sans: (Int)text字段字符串中包含的半步数num_prior_moves: (Int) 在此记录的 FEN 之前已经发生的半步数(注意:黑白双方各移动一次算作两次移动)game_id: (Str) 游戏来源的 lichess 标识符
在 text 字段中,使用特殊标记作为分隔符:
python SPECIAL_TOKENS = { "PGN_START": "~", "MOVE_SEP": ">", }
示例记录
json { "text": "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1~e4>e6>d3>d5>Nd2", "pgn_start": 57, "num_sans": 5, "num_prior_moves": 0, "game_id": "https://lichess.org/PwE2cWn3" }



