five

nchapman/figaro-creative-writing

收藏
Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/nchapman/figaro-creative-writing
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: apache-2.0 size_categories: - 1K<n<10K task_categories: - text-generation tags: - creative-writing - fiction - editor-pipeline - synthetic - chat dataset_info: features: - name: messages list: - name: role dtype: string - name: content dtype: string - name: id dtype: int64 splits: - name: train num_examples: 6022 --- # figaro-creative-writing A high-quality creative writing dataset built using an editor feedback pipeline. Each story goes through three stages: DeepSeek V3.2 writes a first draft, Grok 4.1 Fast provides detailed editorial feedback, then DeepSeek revises based on that feedback. The revision is the final output. ## Overview | | | |---|---| | **Rows** | 6,022 | | **Format** | Single-turn chat (system + user + assistant) | | **Prompts** | [Gryphe/Opus-WritingPrompts](https://huggingface.co/datasets/Gryphe/Opus-WritingPrompts) | | **Writer model** | DeepSeek V3.2 (temp=1.0, max_tokens=4096) | | **Editor model** | Grok 4.1 Fast (x-ai/grok-4.1-fast) | ## How it was built 1. **Prompts** — 6,022 creative writing prompts from [Gryphe/Opus-WritingPrompts](https://huggingface.co/datasets/Gryphe/Opus-WritingPrompts), each with a genre and title constraint. 2. **Draft** — DeepSeek V3.2 writes a first draft at temperature 1.0. 3. **Editor feedback** — Grok 4.1 Fast reviews the draft as a fiction editor, providing 400-600 words of actionable feedback: identifying flat prose, telling-not-showing, weak endings, and what's working well. The feedback quotes specific lines and suggests concrete rewrites. 4. **Revision** — DeepSeek V3.2 rewrites the story from scratch based on the editor's notes. The revision is the final output — a complete rewrite, not a patch job. ## Why an editor pipeline? A/B testing showed that editor-revised stories score significantly higher on literary quality (avg 7.9/10 vs 4.4/10 for single-shot drafts, as judged by Gemini Pro). The editor pushes DeepSeek away from its default mode (confident, fluent, summarizing) toward more literary fiction (scenic, specific, dramatized). Key design decisions: - **Split model roles** — DeepSeek writes (cheap, fast, strong voice), a different model edits. DeepSeek editing its own work reinforces its blind spots. - **No reward model selection** — Earlier experiments showed Skywork-Reward actively penalizes literary improvements, rewarding chat-style fluency over prose craft. The editor pipeline produces better writing that scores *lower* on reward models. ## System message All rows use the same system message: > You are a creative writing assistant. Write vivid, engaging fiction with clean prose, authentic dialogue, and a compelling voice. ## Schema Each row contains: | Field | Type | Description | |---|---|---| | `messages` | list | Chat messages: system, user, assistant | | `id` | int | Prompt index from the source dataset | ## Intended use Fine-tuning language models for creative fiction writing. Part of the [figaro](https://github.com/nchapman/figaro) training data mix. ## Pipeline Built with [figaro-creative-writing](https://github.com/nchapman/figaro-creative-writing). The pipeline (prepare, generate, upload) is reproducible from that repo.
提供机构:
nchapman
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作