TannerGladson/chess-roberta-pretraining

Name: TannerGladson/chess-roberta-pretraining
Creator: TannerGladson
Published: 2024-05-12 03:55:44
License: 暂无描述

Hugging Face2024-05-12 更新2024-06-26 收录

下载链接：

https://hf-mirror.com/datasets/TannerGladson/chess-roberta-pretraining

下载链接

链接失效反馈

官方服务：

资源简介：

--- # For reference on dataset card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1 # Doc / guide: https://huggingface.co/docs/hub/datasets-cards {} --- # Dataset Card for Dataset Name A collection of move sequences from chess games found at https://database.lichess.org/ ## Dataset Details A collection of chess board states and the associated move sequences. PGN files have been downloaded from https://database.lichess.org/. Each game has been parsed into multiple records, where each record begins with an FEN and is followed by 1 to 10 SANs. ## Uses Will be used to train ChessRoberta. This should likely not be used to create a high performing chess model. These games have not been filtered. ## Dataset Structure Each line is a record, and each record has the following fields: * text: (Str) the aforementioned string where an FEN and several SANs have been concatentated * pgn_start: (Int) the index of the first character of the first SAN (in the text field's string) * num_sans: (Int) the number of half-moves contained in the text field's string * num_prior_moves (Int) the number of half-moves which elapsed before this record's FEN (remark: black and white each moving once is two elapsed moves) * game_id (Str) the lichess identifier where the game was taken from Within the text field, special tokens are used as seperators ``` SPECIAL_TOKENS = { "PGN_START": "~", "MOVE_SEP": ">", } ``` ### Example Record ``` { text: "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1~e4>e6>d3>d5>Nd2" pgn_start: 57 num_sans: 5 num_prior_moves: 0 game_id: "https://lichess.org/PwE2cWn3" } ```

提供机构：

TannerGladson

原始信息汇总

数据集卡片 for Dataset Name

数据集详情

这是一个国际象棋游戏移动序列的集合，数据来源于 https://database.lichess.org/。数据集包含棋盘状态和相应的移动序列。PGN 文件从 https://database.lichess.org/ 下载，每个游戏被解析成多个记录，每个记录以一个 FEN 开始，后面跟着 1 到 10 个 SAN。

用途

该数据集将用于训练 ChessRoberta。不建议使用此数据集创建高性能的国际象棋模型，因为这些游戏未经筛选。

数据集结构

每行是一个记录，每个记录包含以下字段：

text: (Str) 一个字符串，包含 FEN 和多个 SAN 的连接
pgn_start: (Int) 第一个 SAN 在 text 字段字符串中的起始字符索引
num_sans: (Int) text 字段字符串中包含的半步数
num_prior_moves: (Int) 在此记录的 FEN 之前已经发生的半步数（注意：黑白双方各移动一次算作两次移动）
game_id: (Str) 游戏来源的 lichess 标识符

在 text 字段中，使用特殊标记作为分隔符：

python SPECIAL_TOKENS = { "PGN_START": "~", "MOVE_SEP": ">", }

示例记录

json { "text": "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1~e4>e6>d3>d5>Nd2", "pgn_start": 57, "num_sans": 5, "num_prior_moves": 0, "game_id": "https://lichess.org/PwE2cWn3" }

5,000+

优质数据集

54 个

任务类型

进入经典数据集