five

Tolkien's Legendarium Text Corpus

收藏
arXiv2025-09-30 收录
下载链接:
https://github.com/booknlp/booknlp
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了《指环王》、《霍比特人》和《精灵宝钻》三部作品的英文全文,旨在提取语言和文学特征,尤其关注人物关系网络。此外,数据集还包括从句子中提取的人物命名实体识别和共现情况,特别强调在共指消解方面的高精确度。该数据集的任务是对人物网络进行分析以及叙事结构的可视化。

This dataset contains the full English text of three classic literary works: *The Lord of the Rings*, *The Hobbit*, and *The Silmarillion*. It is designed to extract linguistic and literary features, with a particular focus on character relationship networks. Additionally, the dataset includes named entity recognition (NER) annotations and character co-occurrence data extracted from sentences, with special emphasis on achieving high accuracy in coreference resolution. The core tasks of this dataset cover character network analysis and narrative structure visualization.
提供机构:
BookNLP
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作