brandburner/doctorwho-s13-narrative-kg
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/brandburner/doctorwho-s13-narrative-kg
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-sa-4.0
task_categories:
- graph-ml
- text-generation
tags:
- narrative
- knowledge-graph
- screenplay
- fabula
- neo4j
- graph-gravity
size_categories:
- 1K<n<10K
---
# Doctor Who - Narrative Knowledge Graph
A rich narrative knowledge graph extracted from *Doctor Who* screenplays using the
[Fabula](https://fabula.productions) pipeline. Contains characters,
locations, objects, organizations, events, themes, and conflict arcs with full
participation semantics and Graph Gravity importance tiers.
## Dataset Overview
| Metric | Value |
|--------|-------|
| Source database | `doctorwho.s13` |
| Type | Season database |
| Episodes | 26 |
| Total nodes | 4,098 |
| Total edges | 12,522 |
| Schema version | 1.1.0 |
| Exported | 2026-04-08 |
### Entity Breakdown
| Type | Count |
|------|-------|
| Act | 71 |
| Agent | 154 |
| ConflictArc | 137 |
| Episode | 26 |
| Event | 746 |
| Location | 268 |
| Object | 613 |
| Organization | 46 |
| PlotBeat | 1,400 |
| SceneBoundary | 518 |
| Theme | 119 |
### Graph Gravity Tiers
| Tier | Count | Description |
|------|-------|-------------|
| anchor | 18 | Main characters / key locations |
| planet | 266 | Recurring entities |
| asteroid | 797 | Minor / one-off entities |
### Relationship Types
`AFFILIATED_WITH`, `BELONGS_TO_EPISODE`, `CALLBACK`, `CAUSAL`, `CHARACTER_CONTINUITY`, `CONTAINS_ACT`, `CONTAINS_BEAT`, `CONTAINS_SCENE`, `EMOTIONAL_ECHO`, `ESCALATION`, `EXEMPLIFIES_THEME`, `FORESHADOWING`, `INVOLVED_IN_ARC`, `INVOLVED_WITH`, `IN_EVENT`, `NARRATIVELY_FOLLOWS`, `OCCURS_IN`, `PARTICIPATED_AS`, `PART_OF`, `PART_OF_ACT` ... and 6 more
## Related Datasets
This is a **single-season** dataset containing entities and events as extracted from Season 13 screenplays.
- **Megagraph** (all seasons unified): [brandburner/doctorwho-mega-narrative-kg](https://huggingface.co/datasets/brandburner/doctorwho-mega-narrative-kg)
> **Note:** The megagraph is *not* a simple union of season datasets. Cross-season entities are reconciled through a Global Entity Registry (GER), receiving new canonical UUIDs and distilled descriptions. Graph Gravity tiers are recalculated across all episodes. Use individual season datasets for single-season analysis; use the megagraph for cross-season analysis.
## Files
| File | Description |
|------|-------------|
| `nodes.parquet` | All graph nodes with properties |
| `edges.parquet` | All relationships with properties |
| `positions.parquet` | 3D layout coordinates for visualization |
| `meta.json` | Dataset metadata and entity counts |
## Schema
### Nodes (`nodes.parquet`)
| Column | Type | Description |
|--------|------|-------------|
| `node_id` | string | Unique node identifier (UUID) |
| `primary_label` | string | Node type (Agent, Location, Event, etc.) |
| `name` | string | Display name |
| `description` | string | Foundational description |
| `tier` | string (nullable) | Graph Gravity tier: anchor / planet / asteroid |
| `episode_count` | int (nullable) | Number of distinct episodes entity appears in |
| `first_episode_seq` | int (nullable) | First appearance episode |
| `last_episode_seq` | int (nullable) | Last appearance episode |
| `properties_json` | string | Full node properties as JSON |
### Edges (`edges.parquet`)
| Column | Type | Description |
|--------|------|-------------|
| `source_node_id` | string | Source node UUID |
| `target_node_id` | string | Target node UUID |
| `relationship_type` | string | Relationship type (e.g., PARTICIPATED_AS) |
| `properties_json` | string | Edge properties as JSON |
### Positions (`positions.parquet`)
| Column | Type | Description |
|--------|------|-------------|
| `node_id` | string | Node UUID |
| `x`, `y`, `z` | float | 3D coordinates |
| `size` | float | Node size (Graph Gravity weighted) |
| `r`, `g`, `b` | int | RGB color by entity type |
| `community` | int | Louvain community index |
| `tier` | string (nullable) | Graph Gravity tier |
## Usage
```python
from datasets import load_dataset
import pandas as pd
# Load from HuggingFace
ds = load_dataset("brandburner/doctorwho-s13-narrative-kg")
# Or load parquet directly
nodes = pd.read_parquet("nodes.parquet")
edges = pd.read_parquet("edges.parquet")
# Filter to anchor characters
anchors = nodes[(nodes['primary_label'] == 'Agent') & (nodes['tier'] == 'anchor')]
# Build a NetworkX graph
import networkx as nx
G = nx.DiGraph()
for _, n in nodes.iterrows():
G.add_node(n['node_id'], label=n['primary_label'], name=n['name'])
for _, e in edges.iterrows():
G.add_edge(e['source_node_id'], e['target_node_id'], type=e['relationship_type'])
```
## Citation
```bibtex
@misc{fabula_doctorwho_s13,
title = {Doctor Who Narrative Knowledge Graph},
author = {Fabula Pipeline},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/datasets/brandburner/doctorwho-s13-narrative-kg}}
}
```
## License
CC BY-SA 4.0
---
许可证:CC BY-SA 4.0(知识共享署名-相同方式共享4.0协议)
任务类别:
- 图机器学习(Graph ML)
- 文本生成
标签:
- 叙事
- 知识图谱(Knowledge Graph)
- 剧本
- Fabula
- Neo4j
- 图引力(Graph Gravity)
数据规模区间:1000 < n < 10000
---
# 《神秘博士》叙事知识图谱(Doctor Who - Narrative Knowledge Graph)
本数据集为基于[Fabula](https://fabula.productions)工具链从《神秘博士》(*Doctor Who*)剧本中提取的高质量叙事知识图谱,涵盖角色、智能体(Agent)、地点、物品、组织、事件、主题及冲突弧线,包含完整的参与语义与图引力重要性层级。
## 数据集概览
| 指标 | 数值 |
|--------|-------|
| 源数据库 | `doctorwho.s13` |
| 数据集类型 | 季播数据库 |
| 剧集数量 | 26 |
| 总节点数 | 4,098 |
| 总边数 | 12,522 |
| 模式版本 | 1.1.0 |
| 导出日期 | 2026-04-08 |
### 实体分类统计
| 实体类型 | 数量 |
|------|-------|
| 幕次(Act) | 71 |
| 智能体(Agent) | 154 |
| 冲突弧线(ConflictArc) | 137 |
| 剧集(Episode) | 26 |
| 事件(Event) | 746 |
| 地点(Location) | 268 |
| 物品(Object) | 613 |
| 组织(Organization) | 46 |
| 剧情节拍(PlotBeat) | 1,400 |
| 场景边界(SceneBoundary) | 518 |
| 主题(Theme) | 119 |
### 图引力层级
| 层级 | 数量 | 描述 |
|------|-------|-------------|
| 锚点(anchor) | 18 | 主要角色/关键地点 |
| 行星(planet) | 266 | 常驻实体 |
| 小行星(asteroid) | 797 | 次要/一次性实体 |
### 关系类型
关系类型包括:`AFFILIATED_WITH`、`BELONGS_TO_EPISODE`、`CALLBACK`、`CAUSAL`、`CHARACTER_CONTINUITY`、`CONTAINS_ACT`、`CONTAINS_BEAT`、`CONTAINS_SCENE`、`EMOTIONAL_ECHO`、`ESCALATION`、`EXEMPLIFIES_THEME`、`FORESHADOWING`、`INVOLVED_IN_ARC`、`INVOLVED_WITH`、`IN_EVENT`、`NARRATIVELY_FOLLOWS`、`OCCURS_IN`、`PARTICIPATED_AS`、`PART_OF`、`PART_OF_ACT`……等共26种关系类型(原文提及另有6种未完全列出)。
## 相关数据集
本数据集为单季数据集,包含从《神秘博士》第13季剧本中提取的实体与事件。
- **全季统一图谱(Megagraph)**:[brandburner/doctorwho-mega-narrative-kg](https://huggingface.co/datasets/brandburner/doctorwho-mega-narrative-kg)
> **注意:** 全季统一图谱并非各季数据集的简单合并。跨季实体将通过全局实体注册表(Global Entity Registry, GER)进行统一匹配,分配新的标准通用唯一标识符(UUID)并提炼描述信息。图引力层级将基于全剧集数据重新计算。若需进行单季分析,请使用单季数据集;若需进行跨季分析,请使用全季统一图谱。
## 文件列表
| 文件名称 | 文件说明 |
|------|-------------|
| `nodes.parquet` | 所有带属性的图谱节点 |
| `edges.parquet` | 所有带属性的关系边 |
| `positions.parquet` | 用于可视化的3D布局坐标 |
| `meta.json` | 数据集元数据与实体统计信息 |
## 数据模式
### 节点表(`nodes.parquet`)
| 列名 | 数据类型 | 描述 |
|--------|------|-------------|
| `node_id` | 字符串 | 唯一节点标识符(UUID) |
| `primary_label` | 字符串 | 节点类型(如智能体(Agent)、地点、事件等) |
| `name` | 字符串 | 节点显示名称 |
| `description` | 字符串 | 实体基础描述 |
| `tier` | 字符串(可空) | 图引力层级:anchor / planet / asteroid |
| `episode_count` | 整数(可空) | 实体出现的不同剧集数量 |
| `first_episode_seq` | 整数(可空) | 实体首次出现的剧集序号 |
| `last_episode_seq` | 整数(可空) | 实体末次出现的剧集序号 |
| `properties_json` | 字符串 | 以JSON格式存储的完整节点属性 |
### 边表(`edges.parquet`)
| 列名 | 数据类型 | 描述 |
|--------|------|-------------|
| `source_node_id` | 字符串 | 源节点通用唯一标识符(UUID) |
| `target_node_id` | 字符串 | 目标节点通用唯一标识符(UUID) |
| `relationship_type` | 字符串 | 关系类型(如`PARTICIPATED_AS`) |
| `properties_json` | 字符串 | 以JSON格式存储的完整边属性 |
### 坐标表(`positions.parquet`)
| 列名 | 数据类型 | 描述 |
|--------|------|-------------|
| `node_id` | 字符串 | 节点通用唯一标识符(UUID) |
| `x`, `y`, `z` | 浮点数 | 三维可视化坐标 |
| `size` | 浮点数 | 基于图引力加权的节点大小 |
| `r`, `g`, `b` | 整数 | 基于实体类型的RGB配色值 |
| `community` | 整数 | Louvain社区算法生成的社区索引 |
| `tier` | 字符串(可空) | 图引力层级 |
## 用法
python
from datasets import load_dataset
import pandas as pd
# 从HuggingFace加载数据集
ds = load_dataset("brandburner/doctorwho-s13-narrative-kg")
# 或直接加载Parquet格式文件
nodes = pd.read_parquet("nodes.parquet")
edges = pd.read_parquet("edges.parquet")
# 筛选锚点角色
anchors = nodes[(nodes['primary_label'] == 'Agent') & (nodes['tier'] == 'anchor')]
# 构建NetworkX有向图谱
import networkx as nx
G = nx.DiGraph()
for _, n in nodes.iterrows():
G.add_node(n['node_id'], label=n['primary_label'], name=n['name'])
for _, e in edges.iterrows():
G.add_edge(e['source_node_id'], e['target_node_id'], type=e['relationship_type'])
## 引用
bibtex
@misc{fabula_doctorwho_s13,
title = {Doctor Who Narrative Knowledge Graph},
author = {Fabula Pipeline},
year = {2026},
publisher = {HuggingFace},
howpublished = {\url{https://huggingface.co/datasets/brandburner/doctorwho-s13-narrative-kg}}
}
## 许可证
CC BY-SA 4.0(知识共享署名-相同方式共享4.0协议)
提供机构:
brandburner



