PG-19
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/deepmind/pg19
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是从Project Gutenberg中筛选出的一组英文诗歌集,以诗节为单位进行组织,每个诗节又分为四行,并配有相应的韵律模式。该数据集包含的诗歌词汇量限制在最常见的50,000个单词内,且每四行诗的最大序列长度为50。此数据集已被用于评估与诗歌生成相关的各种模型和任务。其规模包含757,891个四行诗,分为训练集、验证集和测试集。该数据集的任务是使用强化学习进行诗歌生成和修订。
This dataset is a curated collection of English poems sourced from Project Gutenberg, structured by stanzas. Each stanza contains exactly four lines and is associated with a corresponding rhyme scheme. The vocabulary used for the poems in this dataset is limited to the 50,000 most common English words, and the maximum sequence length of each four-line stanza is 50. This dataset has been employed to evaluate various models and tasks related to poetry generation. It consists of 757,891 four-line stanzas, which are split into training, validation, and test sets. The core tasks for this dataset include poetry generation and revision using reinforcement learning.
提供机构:
Project Gutenberg



