five

OpenEA dataset v1.1

收藏
DataCite Commons2025-06-01 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/OpenEA_dataset_v1_1/19258760/2
下载链接
链接失效反馈
官方服务:
资源简介:
Entity alignment seeks to find entities in different knowledge graphs (KGs) that refer to the same real-world object. Recent advancement in KG embedding impels the advent of embedding-based entity alignment, which encodes entities in a continuous embedding space and measures entity similarities based on the learned embeddings. In this paper, we conduct a comprehensive experimental study of this emerging field. This study surveys 23 recent embedding-based entity alignment approaches and categorizes them based on their techniques and characteristics. We further observe that current approaches use different datasets in evaluation, and the degree distributions of entities in these datasets are inconsistent with real KGs. Hence, we propose a new KG sampling algorithm, with which we generate a set of dedicated benchmark datasets with various heterogeneity and distributions for a realistic evaluation. This study also produces an open-source library, which includes 12 representative embedding-based entity alignment approaches. We extensively evaluate these approaches on the generated datasets, to understand their strengths and limitations. Additionally, for several directions that have not been explored in current approaches, we perform exploratory experiments and report our preliminary findings for future studies. The benchmark datasets, open-source library and experimental results are all accessible online and will be duly maintained.

实体对齐旨在找出不同知识图谱(KGs)中指向同一现实世界对象的实体。知识图谱嵌入技术的近期进展推动了基于嵌入的实体对齐方法的兴起,该类方法将实体编码至连续嵌入空间中,并基于学习得到的嵌入来衡量实体间的相似度。本文针对这一新兴领域开展了全面的实验研究。本研究梳理了23种近期提出的基于嵌入的实体对齐方法,并依据其技术路径与特性进行了分类。我们进一步发现,现有方法在评估时采用了不同的数据集,且这些数据集中实体的度分布与真实知识图谱并不一致。为此,我们提出了一种全新的知识图谱采样算法,并基于该算法生成了一系列具备不同异质性与分布特性的专用基准数据集,用于开展贴近现实场景的评估。本研究同时构建了一个开源工具库,收录了12种具有代表性的基于嵌入的实体对齐方法。我们在生成的数据集上对这些方法进行了全面的评估,以明晰其优势与局限性。此外,针对现有方法尚未探索的若干研究方向,我们开展了探索性实验,并汇报了初步发现以供后续研究参考。本研究生成的基准数据集、开源工具库与实验结果均已在线公开,并将得到持续维护。
提供机构:
figshare
创建时间:
2022-03-01
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作