DuIE

Name: DuIE
Creator: OpenDataLab
Published: 2026-05-17 11:30:42
License: 暂无描述

OpenDataLab2026-05-17 更新2024-05-09 收录

下载链接：

https://opendatalab.org.cn/OpenDataLab/DuIE

下载链接

链接失效反馈

官方服务：

资源简介：

DuIE是一个大规模的人工注释数据集，可用于评估基于架构的知识提取算法。数据集包含210,000多个现实世界的汉语句子，涉及450,000多个SPO三元组 (即: 主语-谓语-宾语三元组)，由预先指定的模式和49个谓语组成。该数据集中的所有句子均提取自百度百科和百度新闻搜索。此数据集中的文本涵盖了现实世界应用程序中的各个领域，例如新闻，娱乐，用户生成的内容。数据集由以下数据组成: 214,590句子，其中: 172,983句子是训练集; 21,626句子是开发集; 19,981句子是测试集。457,866示例，其中: 363,960示例是训练集; 45,558示例是开发集; 48,348示例是测试集。

DuIE is a large-scale manually annotated dataset designed for evaluating architecture-based knowledge extraction algorithms. The dataset contains over 210,000 real-world Chinese sentences, covering more than 450,000 SPO (Subject-Predicate-Object) triples constructed based on pre-specified schemas and 49 predicates. All sentences in this dataset are extracted from Baidu Encyclopedia and Baidu News Search. The texts in this dataset cover various domains in real-world applications, such as news, entertainment, and user-generated content. The dataset consists of the following components: 214,590 sentences, including 172,983 for training, 21,626 for development, and 19,981 for testing; and 457,866 instances, among which 363,960 are for training, 45,558 for development, and 48,348 for testing.

提供机构：

OpenDataLab

创建时间：

2023-04-20

搜集汇总

数据集介绍