five

brandburner/doctorwho-s13-narrative-kg

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/brandburner/doctorwho-s13-narrative-kg
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-4.0 task_categories: - graph-ml - text-generation tags: - narrative - knowledge-graph - screenplay - fabula - neo4j - graph-gravity size_categories: - 1K<n<10K --- # Doctor Who - Narrative Knowledge Graph A rich narrative knowledge graph extracted from *Doctor Who* screenplays using the [Fabula](https://fabula.productions) pipeline. Contains characters, locations, objects, organizations, events, themes, and conflict arcs with full participation semantics and Graph Gravity importance tiers. ## Dataset Overview | Metric | Value | |--------|-------| | Source database | `doctorwho.s13` | | Type | Season database | | Episodes | 26 | | Total nodes | 4,098 | | Total edges | 12,522 | | Schema version | 1.1.0 | | Exported | 2026-04-08 | ### Entity Breakdown | Type | Count | |------|-------| | Act | 71 | | Agent | 154 | | ConflictArc | 137 | | Episode | 26 | | Event | 746 | | Location | 268 | | Object | 613 | | Organization | 46 | | PlotBeat | 1,400 | | SceneBoundary | 518 | | Theme | 119 | ### Graph Gravity Tiers | Tier | Count | Description | |------|-------|-------------| | anchor | 18 | Main characters / key locations | | planet | 266 | Recurring entities | | asteroid | 797 | Minor / one-off entities | ### Relationship Types `AFFILIATED_WITH`, `BELONGS_TO_EPISODE`, `CALLBACK`, `CAUSAL`, `CHARACTER_CONTINUITY`, `CONTAINS_ACT`, `CONTAINS_BEAT`, `CONTAINS_SCENE`, `EMOTIONAL_ECHO`, `ESCALATION`, `EXEMPLIFIES_THEME`, `FORESHADOWING`, `INVOLVED_IN_ARC`, `INVOLVED_WITH`, `IN_EVENT`, `NARRATIVELY_FOLLOWS`, `OCCURS_IN`, `PARTICIPATED_AS`, `PART_OF`, `PART_OF_ACT` ... and 6 more ## Related Datasets This is a **single-season** dataset containing entities and events as extracted from Season 13 screenplays. - **Megagraph** (all seasons unified): [brandburner/doctorwho-mega-narrative-kg](https://huggingface.co/datasets/brandburner/doctorwho-mega-narrative-kg) > **Note:** The megagraph is *not* a simple union of season datasets. Cross-season entities are reconciled through a Global Entity Registry (GER), receiving new canonical UUIDs and distilled descriptions. Graph Gravity tiers are recalculated across all episodes. Use individual season datasets for single-season analysis; use the megagraph for cross-season analysis. ## Files | File | Description | |------|-------------| | `nodes.parquet` | All graph nodes with properties | | `edges.parquet` | All relationships with properties | | `positions.parquet` | 3D layout coordinates for visualization | | `meta.json` | Dataset metadata and entity counts | ## Schema ### Nodes (`nodes.parquet`) | Column | Type | Description | |--------|------|-------------| | `node_id` | string | Unique node identifier (UUID) | | `primary_label` | string | Node type (Agent, Location, Event, etc.) | | `name` | string | Display name | | `description` | string | Foundational description | | `tier` | string (nullable) | Graph Gravity tier: anchor / planet / asteroid | | `episode_count` | int (nullable) | Number of distinct episodes entity appears in | | `first_episode_seq` | int (nullable) | First appearance episode | | `last_episode_seq` | int (nullable) | Last appearance episode | | `properties_json` | string | Full node properties as JSON | ### Edges (`edges.parquet`) | Column | Type | Description | |--------|------|-------------| | `source_node_id` | string | Source node UUID | | `target_node_id` | string | Target node UUID | | `relationship_type` | string | Relationship type (e.g., PARTICIPATED_AS) | | `properties_json` | string | Edge properties as JSON | ### Positions (`positions.parquet`) | Column | Type | Description | |--------|------|-------------| | `node_id` | string | Node UUID | | `x`, `y`, `z` | float | 3D coordinates | | `size` | float | Node size (Graph Gravity weighted) | | `r`, `g`, `b` | int | RGB color by entity type | | `community` | int | Louvain community index | | `tier` | string (nullable) | Graph Gravity tier | ## Usage ```python from datasets import load_dataset import pandas as pd # Load from HuggingFace ds = load_dataset("brandburner/doctorwho-s13-narrative-kg") # Or load parquet directly nodes = pd.read_parquet("nodes.parquet") edges = pd.read_parquet("edges.parquet") # Filter to anchor characters anchors = nodes[(nodes['primary_label'] == 'Agent') & (nodes['tier'] == 'anchor')] # Build a NetworkX graph import networkx as nx G = nx.DiGraph() for _, n in nodes.iterrows(): G.add_node(n['node_id'], label=n['primary_label'], name=n['name']) for _, e in edges.iterrows(): G.add_edge(e['source_node_id'], e['target_node_id'], type=e['relationship_type']) ``` ## Citation ```bibtex @misc{fabula_doctorwho_s13, title = {Doctor Who Narrative Knowledge Graph}, author = {Fabula Pipeline}, year = {2026}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/datasets/brandburner/doctorwho-s13-narrative-kg}} } ``` ## License CC BY-SA 4.0

--- 许可证:CC BY-SA 4.0(知识共享署名-相同方式共享4.0协议) 任务类别: - 图机器学习(Graph ML) - 文本生成 标签: - 叙事 - 知识图谱(Knowledge Graph) - 剧本 - Fabula - Neo4j - 图引力(Graph Gravity) 数据规模区间:1000 < n < 10000 --- # 《神秘博士》叙事知识图谱(Doctor Who - Narrative Knowledge Graph) 本数据集为基于[Fabula](https://fabula.productions)工具链从《神秘博士》(*Doctor Who*)剧本中提取的高质量叙事知识图谱,涵盖角色、智能体(Agent)、地点、物品、组织、事件、主题及冲突弧线,包含完整的参与语义与图引力重要性层级。 ## 数据集概览 | 指标 | 数值 | |--------|-------| | 源数据库 | `doctorwho.s13` | | 数据集类型 | 季播数据库 | | 剧集数量 | 26 | | 总节点数 | 4,098 | | 总边数 | 12,522 | | 模式版本 | 1.1.0 | | 导出日期 | 2026-04-08 | ### 实体分类统计 | 实体类型 | 数量 | |------|-------| | 幕次(Act) | 71 | | 智能体(Agent) | 154 | | 冲突弧线(ConflictArc) | 137 | | 剧集(Episode) | 26 | | 事件(Event) | 746 | | 地点(Location) | 268 | | 物品(Object) | 613 | | 组织(Organization) | 46 | | 剧情节拍(PlotBeat) | 1,400 | | 场景边界(SceneBoundary) | 518 | | 主题(Theme) | 119 | ### 图引力层级 | 层级 | 数量 | 描述 | |------|-------|-------------| | 锚点(anchor) | 18 | 主要角色/关键地点 | | 行星(planet) | 266 | 常驻实体 | | 小行星(asteroid) | 797 | 次要/一次性实体 | ### 关系类型 关系类型包括:`AFFILIATED_WITH`、`BELONGS_TO_EPISODE`、`CALLBACK`、`CAUSAL`、`CHARACTER_CONTINUITY`、`CONTAINS_ACT`、`CONTAINS_BEAT`、`CONTAINS_SCENE`、`EMOTIONAL_ECHO`、`ESCALATION`、`EXEMPLIFIES_THEME`、`FORESHADOWING`、`INVOLVED_IN_ARC`、`INVOLVED_WITH`、`IN_EVENT`、`NARRATIVELY_FOLLOWS`、`OCCURS_IN`、`PARTICIPATED_AS`、`PART_OF`、`PART_OF_ACT`……等共26种关系类型(原文提及另有6种未完全列出)。 ## 相关数据集 本数据集为单季数据集,包含从《神秘博士》第13季剧本中提取的实体与事件。 - **全季统一图谱(Megagraph)**:[brandburner/doctorwho-mega-narrative-kg](https://huggingface.co/datasets/brandburner/doctorwho-mega-narrative-kg) > **注意:** 全季统一图谱并非各季数据集的简单合并。跨季实体将通过全局实体注册表(Global Entity Registry, GER)进行统一匹配,分配新的标准通用唯一标识符(UUID)并提炼描述信息。图引力层级将基于全剧集数据重新计算。若需进行单季分析,请使用单季数据集;若需进行跨季分析,请使用全季统一图谱。 ## 文件列表 | 文件名称 | 文件说明 | |------|-------------| | `nodes.parquet` | 所有带属性的图谱节点 | | `edges.parquet` | 所有带属性的关系边 | | `positions.parquet` | 用于可视化的3D布局坐标 | | `meta.json` | 数据集元数据与实体统计信息 | ## 数据模式 ### 节点表(`nodes.parquet`) | 列名 | 数据类型 | 描述 | |--------|------|-------------| | `node_id` | 字符串 | 唯一节点标识符(UUID) | | `primary_label` | 字符串 | 节点类型(如智能体(Agent)、地点、事件等) | | `name` | 字符串 | 节点显示名称 | | `description` | 字符串 | 实体基础描述 | | `tier` | 字符串(可空) | 图引力层级:anchor / planet / asteroid | | `episode_count` | 整数(可空) | 实体出现的不同剧集数量 | | `first_episode_seq` | 整数(可空) | 实体首次出现的剧集序号 | | `last_episode_seq` | 整数(可空) | 实体末次出现的剧集序号 | | `properties_json` | 字符串 | 以JSON格式存储的完整节点属性 | ### 边表(`edges.parquet`) | 列名 | 数据类型 | 描述 | |--------|------|-------------| | `source_node_id` | 字符串 | 源节点通用唯一标识符(UUID) | | `target_node_id` | 字符串 | 目标节点通用唯一标识符(UUID) | | `relationship_type` | 字符串 | 关系类型(如`PARTICIPATED_AS`) | | `properties_json` | 字符串 | 以JSON格式存储的完整边属性 | ### 坐标表(`positions.parquet`) | 列名 | 数据类型 | 描述 | |--------|------|-------------| | `node_id` | 字符串 | 节点通用唯一标识符(UUID) | | `x`, `y`, `z` | 浮点数 | 三维可视化坐标 | | `size` | 浮点数 | 基于图引力加权的节点大小 | | `r`, `g`, `b` | 整数 | 基于实体类型的RGB配色值 | | `community` | 整数 | Louvain社区算法生成的社区索引 | | `tier` | 字符串(可空) | 图引力层级 | ## 用法 python from datasets import load_dataset import pandas as pd # 从HuggingFace加载数据集 ds = load_dataset("brandburner/doctorwho-s13-narrative-kg") # 或直接加载Parquet格式文件 nodes = pd.read_parquet("nodes.parquet") edges = pd.read_parquet("edges.parquet") # 筛选锚点角色 anchors = nodes[(nodes['primary_label'] == 'Agent') & (nodes['tier'] == 'anchor')] # 构建NetworkX有向图谱 import networkx as nx G = nx.DiGraph() for _, n in nodes.iterrows(): G.add_node(n['node_id'], label=n['primary_label'], name=n['name']) for _, e in edges.iterrows(): G.add_edge(e['source_node_id'], e['target_node_id'], type=e['relationship_type']) ## 引用 bibtex @misc{fabula_doctorwho_s13, title = {Doctor Who Narrative Knowledge Graph}, author = {Fabula Pipeline}, year = {2026}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/datasets/brandburner/doctorwho-s13-narrative-kg}} } ## 许可证 CC BY-SA 4.0(知识共享署名-相同方式共享4.0协议)
提供机构:
brandburner
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作