lmittag/storyforge-sft-v6-dataset

Name: lmittag/storyforge-sft-v6-dataset
Creator: lmittag
Published: 2026-04-27 22:21:45
License: 暂无描述

Hugging Face2026-04-27 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/lmittag/storyforge-sft-v6-dataset

下载链接

链接失效反馈

官方服务：

资源简介：

StoryForge SFT v6数据集是一个用于训练小说写作模型的多任务监督微调数据集，包含16,852条训练记录，涵盖3种任务类型：写作记录（P1章节生成）、节拍计划记录和世界圣经生成记录。数据集设计用于Gemma 4 31B-it / E2B-it模型通过QLoRA NF4进行训练。数据集来源于1,052本独特的书籍（包括有声读物和Royal Road语料库），并使用了生产系统提示。与之前的v4和v5版本相比，v6版本在训练时使用了生产形状的提示，确保了训练分布与推理分布匹配，记录数量和书籍多样性均有显著提升。

The StoryForge SFT v6 Dataset is a production-shape multi-task SFT dataset for training fiction-writing models, containing 16,852 training records across 3 task types: writing records (P1 chapter generation from beats), beat planning records, and world bible generation records. Designed for Gemma 4 31B-it / E2B-it via QLoRA NF4, the dataset is sourced from 1,052 unique books (audiobook + Royal Road corpus) and uses verbatim production system prompts. Compared to previous v4 and v5 versions, v6 employs production-shape prompts at training time, ensuring training distribution matches inference distribution, with 11× the record count of v4 and 3.3× the book diversity.

提供机构：

lmittag

5,000+

优质数据集

54 个

任务类型

进入经典数据集