five

TannerGladson/chess-roberta-pretraining

收藏
Hugging Face2024-05-12 更新2024-06-26 收录
下载链接:
https://hf-mirror.com/datasets/TannerGladson/chess-roberta-pretraining
下载链接
链接失效反馈
官方服务:
资源简介:
--- # For reference on dataset card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/datasetcard.md?plain=1 # Doc / guide: https://huggingface.co/docs/hub/datasets-cards {} --- # Dataset Card for Dataset Name A collection of move sequences from chess games found at https://database.lichess.org/ ## Dataset Details A collection of chess board states and the associated move sequences. PGN files have been downloaded from https://database.lichess.org/. Each game has been parsed into multiple records, where each record begins with an FEN and is followed by 1 to 10 SANs. ## Uses Will be used to train ChessRoberta. This should likely not be used to create a high performing chess model. These games have not been filtered. ## Dataset Structure Each line is a record, and each record has the following fields: * text: (Str) the aforementioned string where an FEN and several SANs have been concatentated * pgn_start: (Int) the index of the first character of the first SAN (in the text field's string) * num_sans: (Int) the number of half-moves contained in the text field's string * num_prior_moves (Int) the number of half-moves which elapsed before this record's FEN (remark: black and white each moving once is two elapsed moves) * game_id (Str) the lichess identifier where the game was taken from Within the text field, special tokens are used as seperators ``` SPECIAL_TOKENS = { "PGN_START": "~", "MOVE_SEP": ">", } ``` ### Example Record ``` { text: "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1~e4>e6>d3>d5>Nd2" pgn_start: 57 num_sans: 5 num_prior_moves: 0 game_id: "https://lichess.org/PwE2cWn3" } ```
提供机构:
TannerGladson
原始信息汇总

数据集卡片 for Dataset Name

数据集详情

这是一个国际象棋游戏移动序列的集合,数据来源于 https://database.lichess.org/。数据集包含棋盘状态和相应的移动序列。PGN 文件从 https://database.lichess.org/ 下载,每个游戏被解析成多个记录,每个记录以一个 FEN 开始,后面跟着 1 到 10 个 SAN。

用途

该数据集将用于训练 ChessRoberta。不建议使用此数据集创建高性能的国际象棋模型,因为这些游戏未经筛选。

数据集结构

每行是一个记录,每个记录包含以下字段:

  • text: (Str) 一个字符串,包含 FEN 和多个 SAN 的连接
  • pgn_start: (Int) 第一个 SAN 在 text 字段字符串中的起始字符索引
  • num_sans: (Int) text 字段字符串中包含的半步数
  • num_prior_moves: (Int) 在此记录的 FEN 之前已经发生的半步数(注意:黑白双方各移动一次算作两次移动)
  • game_id: (Str) 游戏来源的 lichess 标识符

text 字段中,使用特殊标记作为分隔符:

python SPECIAL_TOKENS = { "PGN_START": "~", "MOVE_SEP": ">", }

示例记录

json { "text": "rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1~e4>e6>d3>d5>Nd2", "pgn_start": 57, "num_sans": 5, "num_prior_moves": 0, "game_id": "https://lichess.org/PwE2cWn3" }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作