Narrative Motif Engine — corpus, generated stories, and embeddings
收藏DataCite Commons2026-05-04 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20010023
下载链接
链接失效反馈官方服务:
资源简介:
Companion dataset for the Narrative Motif Engine codebase (github.com/IanMcGarryUL/Narrative-Motif-Engine).
Contains the original corpus (970 stories, 9,700 thematic propositions), 106,699 LLM-generated stylistic variations of those propositions (each with a 768-d embedding), 100 pipeline-generated stories + 100 control-generated stories with their canonical (style-erased) embeddings and 22,000 stylistic variations, three independent passes of style-erased canonical vectors for the original corpus, hierarchical and k-NN cluster assignments at k=72, Kernel-PCA scores (PC1-PC16), and geographical metadata mapping countries to regions.
17 CSV files, ~4.3 GB total. plus 2 optional PostgreSQL dumps (~1.4 GB combined). All embeddings stored as JSON-encoded strings for portability.
See docs/dataset_card.md in the codebase for the full file manifest, generation methodology, and reproduction steps.
提供机构:
Zenodo
创建时间:
2026-05-04



