five

PG-19

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/deepmind/pg19
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集是从Project Gutenberg中筛选出的一组英文诗歌集,以诗节为单位进行组织,每个诗节又分为四行,并配有相应的韵律模式。该数据集包含的诗歌词汇量限制在最常见的50,000个单词内,且每四行诗的最大序列长度为50。此数据集已被用于评估与诗歌生成相关的各种模型和任务。其规模包含757,891个四行诗,分为训练集、验证集和测试集。该数据集的任务是使用强化学习进行诗歌生成和修订。

This dataset is a curated collection of English poems sourced from Project Gutenberg, structured by stanzas. Each stanza contains exactly four lines and is associated with a corresponding rhyme scheme. The vocabulary used for the poems in this dataset is limited to the 50,000 most common English words, and the maximum sequence length of each four-line stanza is 50. This dataset has been employed to evaluate various models and tasks related to poetry generation. It consists of 757,891 four-line stanzas, which are split into training, validation, and test sets. The core tasks for this dataset include poetry generation and revision using reinforcement learning.
提供机构:
Project Gutenberg
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作