HistRED
收藏arXiv2023-07-10 更新2024-06-21 收录
下载链接:
https://huggingface.co/datasets/Soyoung/HistRED
下载链接
链接失效反馈官方服务:
资源简介:
HistRED是一个针对历史文档的关系抽取数据集,由韩国科学技术院人工智能研究所创建。该数据集包含5816个文档,源自16至19世纪的韩文和汉字记录,特别关注《燕行录》。HistRED支持双语标注,允许在韩文和汉字文本上进行关系抽取,适用于评估模型在不同语言和文档长度上的性能。数据集的创建过程中,与领域专家紧密合作,定义了适合历史数据的实体和关系类型。HistRED的应用领域包括历史知识提取和文档级关系抽取模型的评估,旨在解决历史文档中关系信息的提取问题。
HistRED is a relation extraction dataset for historical documents, developed by the Artificial Intelligence Research Institute of the Korea Advanced Institute of Science and Technology (KAIST). This dataset contains 5,816 documents derived from Korean and Classical Chinese records from the 16th to 19th centuries, with a particular focus on the Yeonhaengnok. HistRED supports bilingual annotation, enabling relation extraction on both Korean and Classical Chinese texts, and is suitable for evaluating model performance across different languages and document lengths. During the dataset's development, close collaboration with domain experts was conducted to define entity and relation types tailored for historical data. The application domains of HistRED include historical knowledge extraction and the evaluation of document-level relation extraction models, aiming to address the challenge of extracting relational information from historical documents.
提供机构:
韩国科学技术院人工智能研究所
创建时间:
2023-07-10



