LSOIE
收藏arXiv2021-01-27 更新2024-06-21 收录
下载链接:
https://github.com/Jacobsolawetz/large-scale-oie
下载链接
链接失效反馈官方服务:
资源简介:
LSOIE是一个大规模的监督开放信息提取数据集,由RoboFlow, Inc. 和 Rosegold AI创建。该数据集通过转换QA-SRL 2.0数据集而来,包含超过70,000个句子和超过150,000个提取元组,是现有最大的人工标注OIE数据集的20倍。数据集内容丰富,涵盖多个领域,特别是新增的科学领域。创建过程中,采用了创新的转换算法和质量控制措施。LSOIE数据集主要用于自然语言处理中的知识库创建、文本蕴涵和自然语言理解等下游任务,旨在解决现有OIE数据集在规模和多样性上的不足。
LSOIE is a large-scale supervised open information extraction (OIE) dataset developed by RoboFlow, Inc. and Rosegold AI. Derived from the QA-SRL 2.0 dataset, it contains over 70,000 sentences and more than 150,000 extraction tuples, with a scale 20 times that of the largest existing manually annotated OIE dataset. Boasting rich content, the dataset covers multiple domains, with a particular emphasis on newly added scientific domains. Innovative conversion algorithms and quality control measures were adopted during its development. The LSOIE dataset is primarily designed for downstream natural language processing tasks such as knowledge base construction, textual entailment and natural language understanding, aiming to address the limitations of existing OIE datasets in terms of scale and diversity.
提供机构:
RoboFlow, Inc. 和 Rosegold AI
创建时间:
2021-01-27



