five

brandburner/doctorwho-mega-narrative-kg

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/brandburner/doctorwho-mega-narrative-kg
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-4.0 task_categories: - graph-ml - text-generation tags: - narrative - knowledge-graph - screenplay - fabula - neo4j - graph-gravity size_categories: - 1K<n<10K --- # Doctor Who - Narrative Knowledge Graph A rich narrative knowledge graph extracted from *Doctor Who* screenplays using the [Fabula](https://fabula.productions) pipeline. Contains characters, locations, objects, organizations, events, themes, and conflict arcs with full participation semantics and Graph Gravity importance tiers. ## Dataset Overview | Metric | Value | |--------|-------| | Source database | `doctorwho.mega` | | Type | Megagraph (cross-season merged) | | Episodes | 703 | | Seasons merged | 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 | | Total nodes | 108,231 | | Total edges | 314,654 | | Schema version | 1.1.0 | | Exported | 2026-04-08 | ### Entity Breakdown | Type | Count | |------|-------| | Act | 1,897 | | Agent | 3,948 | | ConflictArc | 3,144 | | Episode | 703 | | Event | 20,497 | | Location | 5,819 | | Object | 15,986 | | Organization | 1,452 | | PlotBeat | 38,087 | | SceneBoundary | 14,477 | | Theme | 2,221 | ### Graph Gravity Tiers | Tier | Count | Description | |------|-------|-------------| | anchor | 471 | Main characters / key locations | | planet | 5,616 | Recurring entities | | asteroid | 21,118 | Minor / one-off entities | ### Relationship Types `AFFILIATED_WITH`, `BELONGS_TO_EPISODE`, `CALLBACK`, `CAUSAL`, `CHARACTER_CONTINUITY`, `CONTAINS_BEAT`, `EMOTIONAL_ECHO`, `ESCALATION`, `EXEMPLIFIES_THEME`, `FORESHADOWING`, `INVOLVED_IN_ARC`, `INVOLVED_WITH`, `IN_EVENT`, `MERGED_INTO`, `NARRATIVELY_FOLLOWS`, `OCCURS_IN`, `PARTICIPATED_AS`, `PART_OF`, `PART_OF_ACT`, `PART_OF_ARC` ... and 5 more ## Megagraph vs. Season Datasets This is a **megagraph** — a cross-season unified knowledge graph, not a simple concatenation of per-season datasets. Key differences from individual season datasets: - **Unified entity identities**: Cross-season entities (recurring characters, locations, organizations) are reconciled through a Global Entity Registry (GER) and assigned new canonical UUIDs. The same character will have a *different* `node_id` here than in any individual season dataset. - **Distilled descriptions**: Entity descriptions may be rewritten during GER reconciliation to reflect a character's full arc rather than a single season's perspective. - **Cross-season Graph Gravity**: Tier assignments (anchor/planet/asteroid) reflect importance across all 703 episodes. An entity that is "planet" tier in one season may become "anchor" in the megagraph because they recur across multiple seasons. - **Season-unique entities preserved**: Entities appearing in only one season are transferred with their original UUIDs and properties. - **Cross-season relationship topology**: The megagraph contains participation and narrative connection patterns that span season boundaries. **For single-season analysis**, use the individual season datasets: - [Season 1](https://huggingface.co/datasets/brandburner/doctorwho-s01-narrative-kg) - [Season 2](https://huggingface.co/datasets/brandburner/doctorwho-s02-narrative-kg) - [Season 3](https://huggingface.co/datasets/brandburner/doctorwho-s03-narrative-kg) - [Season 4](https://huggingface.co/datasets/brandburner/doctorwho-s04-narrative-kg) - [Season 5](https://huggingface.co/datasets/brandburner/doctorwho-s05-narrative-kg) - [Season 6](https://huggingface.co/datasets/brandburner/doctorwho-s06-narrative-kg) - [Season 7](https://huggingface.co/datasets/brandburner/doctorwho-s07-narrative-kg) - [Season 8](https://huggingface.co/datasets/brandburner/doctorwho-s08-narrative-kg) - [Season 9](https://huggingface.co/datasets/brandburner/doctorwho-s09-narrative-kg) - [Season 10](https://huggingface.co/datasets/brandburner/doctorwho-s10-narrative-kg) - [Season 11](https://huggingface.co/datasets/brandburner/doctorwho-s11-narrative-kg) - [Season 12](https://huggingface.co/datasets/brandburner/doctorwho-s12-narrative-kg) - [Season 13](https://huggingface.co/datasets/brandburner/doctorwho-s13-narrative-kg) - [Season 14](https://huggingface.co/datasets/brandburner/doctorwho-s14-narrative-kg) - [Season 15](https://huggingface.co/datasets/brandburner/doctorwho-s15-narrative-kg) - [Season 16](https://huggingface.co/datasets/brandburner/doctorwho-s16-narrative-kg) - [Season 17](https://huggingface.co/datasets/brandburner/doctorwho-s17-narrative-kg) - [Season 18](https://huggingface.co/datasets/brandburner/doctorwho-s18-narrative-kg) - [Season 19](https://huggingface.co/datasets/brandburner/doctorwho-s19-narrative-kg) - [Season 20](https://huggingface.co/datasets/brandburner/doctorwho-s20-narrative-kg) - [Season 21](https://huggingface.co/datasets/brandburner/doctorwho-s21-narrative-kg) - [Season 22](https://huggingface.co/datasets/brandburner/doctorwho-s22-narrative-kg) - [Season 23](https://huggingface.co/datasets/brandburner/doctorwho-s23-narrative-kg) - [Season 24](https://huggingface.co/datasets/brandburner/doctorwho-s24-narrative-kg) - [Season 25](https://huggingface.co/datasets/brandburner/doctorwho-s25-narrative-kg) - [Season 26](https://huggingface.co/datasets/brandburner/doctorwho-s26-narrative-kg) **For cross-season analysis** (character arcs, thematic evolution, entity importance across the full series), use this megagraph. ## Files | File | Description | |------|-------------| | `nodes.parquet` | All graph nodes with properties | | `edges.parquet` | All relationships with properties | | `positions.parquet` | 3D layout coordinates for visualization | | `meta.json` | Dataset metadata and entity counts | ## Schema ### Nodes (`nodes.parquet`) | Column | Type | Description | |--------|------|-------------| | `node_id` | string | Unique node identifier (UUID) | | `primary_label` | string | Node type (Agent, Location, Event, etc.) | | `name` | string | Display name | | `description` | string | Foundational description | | `tier` | string (nullable) | Graph Gravity tier: anchor / planet / asteroid | | `episode_count` | int (nullable) | Number of distinct episodes entity appears in | | `first_episode_seq` | int (nullable) | First appearance episode | | `last_episode_seq` | int (nullable) | Last appearance episode | | `properties_json` | string | Full node properties as JSON | ### Edges (`edges.parquet`) | Column | Type | Description | |--------|------|-------------| | `source_node_id` | string | Source node UUID | | `target_node_id` | string | Target node UUID | | `relationship_type` | string | Relationship type (e.g., PARTICIPATED_AS) | | `properties_json` | string | Edge properties as JSON | ### Positions (`positions.parquet`) | Column | Type | Description | |--------|------|-------------| | `node_id` | string | Node UUID | | `x`, `y`, `z` | float | 3D coordinates | | `size` | float | Node size (Graph Gravity weighted) | | `r`, `g`, `b` | int | RGB color by entity type | | `community` | int | Louvain community index | | `tier` | string (nullable) | Graph Gravity tier | ## Usage ```python from datasets import load_dataset import pandas as pd # Load from HuggingFace ds = load_dataset("brandburner/doctorwho-mega-narrative-kg") # Or load parquet directly nodes = pd.read_parquet("nodes.parquet") edges = pd.read_parquet("edges.parquet") # Filter to anchor characters anchors = nodes[(nodes['primary_label'] == 'Agent') & (nodes['tier'] == 'anchor')] # Build a NetworkX graph import networkx as nx G = nx.DiGraph() for _, n in nodes.iterrows(): G.add_node(n['node_id'], label=n['primary_label'], name=n['name']) for _, e in edges.iterrows(): G.add_edge(e['source_node_id'], e['target_node_id'], type=e['relationship_type']) ``` ## Citation ```bibtex @misc{fabula_doctorwho_mega, title = {Doctor Who Narrative Knowledge Graph}, author = {Fabula Pipeline}, year = {2026}, publisher = {HuggingFace}, howpublished = {\url{https://huggingface.co/datasets/brandburner/doctorwho-mega-narrative-kg}} } ``` ## License CC BY-SA 4.0
提供机构:
brandburner
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作