entity linkage benchmark testing datasets

Name: entity linkage benchmark testing datasets
Creator: 华沙理工大学数学与信息科学学院
Published: 2021-10-04 19:54:43
License: 暂无描述

arXiv2021-10-04 更新2024-06-21 收录

下载链接：

https://doi.org/10.5281/zenodo.4699418

下载链接

链接失效反馈

官方服务：

资源简介：

本数据集名为‘entity linkage benchmark testing datasets’，由华沙理工大学数学与信息科学学院创建。数据集包含1300个句子，来源于13部不同风格和类型的经典小说，这些句子均由一位热爱小说的读者手动标注，用于测试和验证人物实体链接（包括命名实体识别和消歧）的方法。数据集的创建旨在解决自然语言处理中长文本（如小说）的语义标注问题，特别是在识别小说中的人物实体并为其分配唯一身份方面。该数据集的应用领域包括分析小说中主角间的关系、创建摘要、位置检测、事件时间线创建等，旨在通过精确的人物实体识别和链接，深入分析长文本内容。

This dataset, named 'entity linkage benchmark testing datasets', was created by the Faculty of Mathematics and Information Science, Warsaw University of Technology. It comprises 1,300 sentences extracted from 13 classic novels of varying styles and genres. These sentences were manually annotated by a fiction enthusiast for testing and validating character entity linking methods, including named entity recognition (NER) and entity disambiguation. The dataset was developed to address the challenge of semantic annotation for long texts (such as novels) in natural language processing (NLP), specifically regarding the identification of character entities in novels and the assignment of unique identifiers to these entities. Its application fields include analyzing the relationships between protagonists in novels, generating text summaries, location detection, event timeline construction, and more. The core objective of this dataset is to support in-depth analysis of long-form textual content via accurate character entity recognition and linking.

提供机构：

华沙理工大学数学与信息科学学院

创建时间：

2021-10-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集