five

Self-contained ground-truths for cross-domain linkage

收藏
DataCite Commons2020-09-04 更新2024-07-25 收录
下载链接:
https://figshare.com/articles/dataset/Self_contained_ground_truths_for_cross_domain_linkage/3204325
下载链接
链接失效反馈
官方服务:
资源简介:
Cross-domain knowledge bases such as DBpedia, Freebase and YAGO have emerged as encyclopedic hubs in the Web of Linked Data. Despite enabling several practical applications in the Semantic Web, the large-scale, schema-free nature of such graphs often precludes research groups from employing them widely as evaluation test cases for entity resolution and instance-based ontology alignment applications. Although the ground-truth linkages between the three knowledge bases above are available, they are not amenable to resource-limited applications. One reason is that the ground-truth files are not self-contained, meaning that a researcher must usually perform a series of expensive joins (typically in MapReduce) to obtain usable information sets.<br> We constructed this resource by uploading several publicly licensed data resources to the public cloud and used simple Hadoop clusters to compile, and make accessible, three cross-domain self-contained test cases involving linked instances from DBpedia, Freebase and YAGO. Self-containment is enabled by virtue of a simple NoSQL JSON-like serialization format. Potential applications for these resources, particularly related to testing transfer learning research hypotheses, are described in more detail in a paper submission in the resource track at ISWC 2016.

诸如DBpedia、Freebase与YAGO等跨域知识库,现已成为关联数据网络(Web of Linked Data)中的百科类枢纽资源。尽管这类知识库已在语义网(Semantic Web)领域催生了多项实用应用,但其大规模、无模式的特性往往使得研究团队难以将其广泛用作实体解析(entity resolution)与基于实例的本体对齐(instance-based ontology alignment)任务的评估测试集。尽管上述三大知识库之间已存在基准关联数据,但这些数据并不适用于资源受限的应用场景。究其原因,基准真值文件并非自包含文件,这意味着研究人员通常需要执行一系列耗时的连接操作(通常基于MapReduce)才能获取可用的信息集。 本数据集通过将多项公开授权的数据资源上传至公有云(public cloud),并借助简易Hadoop集群进行编译与开放部署,最终生成了三个涵盖DBpedia、Freebase与YAGO关联实例的跨域自包含测试集。该数据集的自包含特性依托于一种简易的非关系型数据库(NoSQL)类JSON序列化格式实现。本数据集的潜在应用场景(尤其是用于验证迁移学习(transfer learning)研究假设的相关场景)的详细说明,已发表于2016年国际语义网大会(ISWC 2016)资源赛道的投稿论文中。
提供机构:
figshare
创建时间:
2016-04-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作