five

Narrative Motif Engine — corpus, generated stories, and embeddings

收藏
DataCite Commons2026-05-04 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20010023
下载链接
链接失效反馈
官方服务:
资源简介:
Companion dataset for the Narrative Motif Engine codebase (github.com/IanMcGarryUL/Narrative-Motif-Engine). Contains the original corpus (970 stories, 9,700 thematic propositions), 106,699 LLM-generated stylistic variations of those propositions (each with a 768-d embedding), 100 pipeline-generated stories + 100 control-generated stories with their canonical (style-erased) embeddings and 22,000 stylistic variations, three independent passes of style-erased canonical vectors for the original corpus, hierarchical and k-NN cluster assignments at k=72, Kernel-PCA scores (PC1-PC16), and geographical metadata mapping countries to regions. 17 CSV files, ~4.3 GB total. plus 2 optional PostgreSQL dumps (~1.4 GB combined). All embeddings stored as JSON-encoded strings for portability. See docs/dataset_card.md in the codebase for the full file manifest, generation methodology, and reproduction steps.
提供机构:
Zenodo
创建时间:
2026-05-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作