five

Pageshift-Entertainment/LongPage

收藏
Hugging Face2026-01-20 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/Pageshift-Entertainment/LongPage
下载链接
链接失效反馈
官方服务:
资源简介:
LongPage数据集是第一个全面的用于训练AI模型创作完整小说的推理数据集。它包含多层次的规划轨迹,包括角色原型、故事情节、世界规则和场景分解,为长篇叙述构建提供了一个完整的认知路线图。数据集包含从短篇到长篇系列的书籍,每本书的标记数从40,000到600,000+不等,质量一致。数据集还提供了丰富的结构化元数据,包括对话密度、节奏和叙事焦点,以支持针对性的训练课程。此外,数据集还包含一个示例合成函数,可以将数据集转换为冷启动SFT→RL工作流,以实现灵活的训练策略。

The LongPage dataset is the first comprehensive dataset for training AI models to write complete novels with sophisticated reasoning. It includes multi-layered planning traces, such as character archetypes, story arcs, world rules, and scene breakdowns, providing a complete cognitive roadmap for long-form narrative construction. The dataset spans from novellas to epic series with consistent quality, featuring books with token counts ranging from 40,000 to 600,000+. It also offers rich structural metadata, including dialogue density, pacing, and narrative focus, to support targeted training curricula. Additionally, the dataset includes an example compose function that transforms it into cold-start SFT→RL workflows for flexible training strategies.
提供机构:
Pageshift-Entertainment
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作