DocOIE
收藏arXiv2021-05-11 更新2024-06-21 收录
下载链接:
https://github.com/daviddongkc/DocOIE
下载链接
链接失效反馈官方服务:
资源简介:
DocOIE是一个文档级上下文感知的开放信息抽取数据集,由南洋理工大学计算机科学与工程学院创建。该数据集包含800个专家标注的句子,来源于80个文档,分布在医疗保健和交通两个领域。每个文档随机选取10个句子进行标注,总计2122个关系元组。数据集的创建过程涉及精心选择文档类型和领域,确保数据集的多样性和信息丰富性。DocOIE数据集主要用于评估和改进文档级开放信息抽取系统,旨在解决现有系统在处理文档级上下文时的问题,提高信息抽取的准确性和完整性。
DocOIE is a document-level context-aware open information extraction dataset, created by the School of Computer Science and Engineering, Nanyang Technological University. This dataset contains 800 expert-annotated sentences sourced from 80 documents spanning two domains: healthcare and transportation. For each document, 10 sentences are randomly selected for annotation, resulting in a total of 2,122 relation tuples. The dataset creation process involves deliberate selection of document types and domains to ensure the dataset's diversity and informational richness. The DocOIE dataset is primarily used to evaluate and improve document-level open information extraction systems, aiming to address the limitations of existing systems when handling document-level contexts and enhance the accuracy and completeness of information extraction.
提供机构:
南洋理工大学计算机科学与工程学院,新加坡
创建时间:
2021-05-10



