five

DuEL 2.0中文短文本实体链指数据集

收藏
千言数据集2024-05-15 收录
下载链接:
https://www.luge.ai/#/luge/dataDetail?id=24
下载链接
链接失效反馈
官方服务:
资源简介:
DuEL 2.0 是一个以中文短文本实体链接为目标任务的数据集。该数据集中的样本主要来自于搜索Query、微博、对话内容、标题等,样本的口语化严重,上下文语境不丰富,难度较大。此外,DuEL2.0数据集具有如下特点: 1、大规模:7万训练集、1万开发集、1万测评集32.4万知识库实体,282.6万SPO; 2、高质量:所有标注数据通过人工众包完成,实体链指及实体类型准确率达95%,知识库实体重复率小于5%; 3、面向真实场景:数据来自于互联网网页标题、UGC短视频标题、搜索Query。

DuEL 2.0 is a dataset targeting the task of Chinese short-text entity linking. The samples in this dataset are mainly collected from search queries, Weibo, conversation content, titles and other sources, which are highly colloquial, lack rich contextual information, and pose relatively high difficulty. In addition, DuEL 2.0 dataset has the following characteristics: 1. Large-scale: It contains 70,000 training samples, 10,000 development samples, 10,000 test samples, 324,000 knowledge base entities, and 2.826 million SPO triples. 2. High-quality: All annotated data is completed through manual crowdsourcing. The accuracy of entity linking and entity typing reaches 95%, and the repetition rate of knowledge base entities is less than 5%. 3. Real-world oriented: The data is sourced from Internet web page titles, UGC short video titles, and search queries.
提供机构:
百度
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
二维码
社区交流群
二维码
科研交流群
商业服务