EDIN

Name: EDIN
Creator: Meta AI 和慕尼黑大学信息与语言处理中心
Published: 2022-05-25 16:29:39
License: 暂无描述

arXiv2022-05-25 更新2024-07-30 收录

下载链接：

https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

下载链接

链接失效反馈

官方服务：

资源简介：

EDIN数据集是一个针对未知实体发现和索引的大型端到端基准数据集，由Meta AI和慕尼黑大学信息与语言处理中心创建。该数据集基于维基百科和OSCAR数据集的新闻页面子集，旨在解决现有知识库不完整和新兴概念不断出现的问题。数据集通过密集检索技术，实现了未知实体的检测、聚类和索引。EDIN数据集特别关注那些对现有实体链接系统完全新颖的未知实体，通过时间分割来研究实体编码器和预训练语言模型（PLM）的退化问题。

The EDIN dataset is a large-scale end-to-end benchmark dataset for unknown entity discovery and indexing, created by Meta AI and the Center for Information and Language Processing at LMU Munich. This dataset is based on the news page subsets of Wikipedia and the OSCAR dataset, aiming to address the problems of incomplete existing knowledge bases and the continuous emergence of emerging concepts. It enables the detection, clustering and indexing of unknown entities via dense retrieval techniques. The EDIN dataset specifically focuses on unknown entities that are entirely novel to existing entity linking systems, and investigates the degradation problem of entity encoders and pre-trained language models (PLMs) through temporal splitting.

提供机构：

Meta AI 和慕尼黑大学信息与语言处理中心

创建时间：

2022-05-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集