HistRED

Name: HistRED
Creator: 韩国科学技术院人工智能研究所
Published: 2023-07-10 08:24:27
License: 暂无描述

arXiv2023-07-10 更新2024-06-21 收录

下载链接：

https://huggingface.co/datasets/Soyoung/HistRED

下载链接

链接失效反馈

官方服务：

资源简介：

HistRED是一个针对历史文档的关系抽取数据集，由韩国科学技术院人工智能研究所创建。该数据集包含5816个文档，源自16至19世纪的韩文和汉字记录，特别关注《燕行录》。HistRED支持双语标注，允许在韩文和汉字文本上进行关系抽取，适用于评估模型在不同语言和文档长度上的性能。数据集的创建过程中，与领域专家紧密合作，定义了适合历史数据的实体和关系类型。HistRED的应用领域包括历史知识提取和文档级关系抽取模型的评估，旨在解决历史文档中关系信息的提取问题。

HistRED is a relation extraction dataset for historical documents, developed by the Artificial Intelligence Research Institute of the Korea Advanced Institute of Science and Technology (KAIST). This dataset contains 5,816 documents derived from Korean and Classical Chinese records from the 16th to 19th centuries, with a particular focus on the Yeonhaengnok. HistRED supports bilingual annotation, enabling relation extraction on both Korean and Classical Chinese texts, and is suitable for evaluating model performance across different languages and document lengths. During the dataset's development, close collaboration with domain experts was conducted to define entity and relation types tailored for historical data. The application domains of HistRED include historical knowledge extraction and the evaluation of document-level relation extraction models, aiming to address the challenge of extracting relational information from historical documents.

提供机构：

韩国科学技术院人工智能研究所

创建时间：

2023-07-10

5,000+

优质数据集

54 个

任务类型

进入经典数据集