Tolkien's Legendarium Text Corpus
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/booknlp/booknlp
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了《指环王》、《霍比特人》和《精灵宝钻》三部作品的英文全文,旨在提取语言和文学特征,尤其关注人物关系网络。此外,数据集还包括从句子中提取的人物命名实体识别和共现情况,特别强调在共指消解方面的高精确度。该数据集的任务是对人物网络进行分析以及叙事结构的可视化。
This dataset contains the full English text of three classic literary works: *The Lord of the Rings*, *The Hobbit*, and *The Silmarillion*. It is designed to extract linguistic and literary features, with a particular focus on character relationship networks. Additionally, the dataset includes named entity recognition (NER) annotations and character co-occurrence data extracted from sentences, with special emphasis on achieving high accuracy in coreference resolution. The core tasks of this dataset cover character network analysis and narrative structure visualization.
提供机构:
BookNLP



