SaladTechnologies/fiction-ner-750m
收藏Hugging Face2025-09-09 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/SaladTechnologies/fiction-ner-750m
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含大约7.5亿个tokens的叙事小说文本数据集,其中包含了实体注释。文本来源于Project Gutenberg、Archive of our Own (AO3)和Internet Archive。数据集采用了BIO标注格式,并为不同的实体类型提供了标签,如角色、地点、设施、重要物品、事件、组织和其他命名实体。该数据集旨在用于训练在叙事小说文本上表现良好的命名实体识别模型。
This dataset contains approximately 750 million tokens of narrative fiction text along with entity annotations. The text is sourced from Project Gutenberg, Archive of our Own (AO3), and a small amount from Internet Archive. The dataset uses the BIO tagging format and provides labels for different entity types such as Character, Location, Facility, Important Object, Event, Organization, and Other Named Entity. The dataset is intended for training Named Entity Recognition models that perform well on narrative fiction text.
提供机构:
SaladTechnologies



