Narrative Motif Engine — corpus, generated stories, and embeddings

Name: Narrative Motif Engine — corpus, generated stories, and embeddings
Creator: Zenodo
Published: 2026-05-04 09:05:48
License: 暂无描述

DataCite Commons2026-05-04 更新2026-05-07 收录

下载链接：

https://zenodo.org/doi/10.5281/zenodo.20010024

下载链接

链接失效反馈

官方服务：

资源简介：

Companion dataset for the Narrative Motif Engine codebase (github.com/IanMcGarryUL/Narrative-Motif-Engine). Contains the original corpus (970 stories, 9,700 thematic propositions), 106,699 LLM-generated stylistic variations of those propositions (each with a 768-d embedding), 100 pipeline-generated stories + 100 control-generated stories with their canonical (style-erased) embeddings and 22,000 stylistic variations, three independent passes of style-erased canonical vectors for the original corpus, hierarchical and k-NN cluster assignments at k=72, Kernel-PCA scores (PC1-PC16), and geographical metadata mapping countries to regions. 17 CSV files, ~4.3 GB total. plus 2 optional PostgreSQL dumps (~1.4 GB combined). All embeddings stored as JSON-encoded strings for portability. See docs/dataset_card.md in the codebase for the full file manifest, generation methodology, and reproduction steps.

本数据集为《叙事主题引擎（Narrative Motif Engine）》代码库（github.com/IanMcGarryUL/Narrative-Motif-Engine）的配套数据集。本数据集包含原始语料库（970篇故事、9700个主题命题）、106699个由大语言模型（LLM）生成的对应命题风格变体（每个变体均附带768维嵌入向量）、100条流水线生成故事与100条对照生成故事及其规范（风格消去）嵌入向量与22000个风格变体、原始语料库的三次独立风格消去规范向量、k=72时的层级聚类与k近邻（k-NN）聚类分配结果、核主成分分析（Kernel-PCA）得分（PC1至PC16），以及国家与地区映射的地理元数据。本数据集包含17个CSV文件，总容量约4.3 GB，另有2个可选的PostgreSQL数据库转储文件（总容量合计约1.4 GB）。所有嵌入向量均以JSON编码字符串形式存储，以保障可移植性。完整的文件清单、生成方法与复现步骤，请参阅代码库中的docs/dataset_card.md文档。

提供机构：

Zenodo

创建时间：

2026-05-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集