five

A Novel Curated Scholarly Graph Connecting Textual and Data Publications

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7464119
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains an open and curated scholarly graph we built as a training and test set for data discovery, data connection, author disambiguation, and link prediction tasks. This graph represents the European Marine Science community included in the OpenAIRE Graph. The nodes of the graph we release represent publications, datasets, software, and authors respectively; edges interconnecting research products always have the publication as source, and the dataset/software as target. In addition, edges are labeled with semantics that outline whether the publication is referencing, citing, documenting, or supplementing the related outcome. To curate and enrich nodes metadata and edges semantics, we relied on the information extracted from the PDF of the publications and the datasets/software webpages respectively. We curated the authors so to remove duplicated nodes representing the same person.  The resource we release counts 4,047 publications, 5,488 datasets, 22 software, 21,561 authors, and 9,692 edges connect publications to datasets/software. This graph is in the curated_MES folder. We provide this resource as: a property graph: we provide the dump that can be imported in neo4j 5 jsonl files containing publications, datasets, software, authors, and relationships respectively. Each line of a jsonl file contains a JSON object representing a node and contains the metadata of that node (or a relationship). We provide two additional scholarly graphs: The curated MES graph with the removed edges. During the curation we removed some edges since they were labeled with an inconsistent or imprecise semantics. This graph includes the same nodes and edges as the previous one, and, in addition, it contains the edges removed during the curation pipeline; these edges are marked as Removed. This graph is in the curated_MES_with_removed_semantics folder.   The original MES community of OpenAIRE. It represents the MES community extracted from the OpenAIRE Research Graph. This graph has not been curated, and the metadata and semantics are those of the OpenAIRE Research Graph. This graph is in the original_MES_community folder.

本数据集包含我们构建的开源精选学术图谱,可作为数据发现、数据关联、作者消歧与链接预测任务的训练与测试集。该图谱对应OpenAIRE图谱中收录的欧洲海洋科学共同体。本次发布的图谱节点分别代表出版物、数据集、软件与作者;连接各类研究成果的边均以出版物为起点,以数据集或软件为终点。此外,边均带有语义标签,用以标识该出版物是引用、引证、记录还是补充了相关研究成果。为了精选并丰富节点元数据与边语义,我们分别从出版物的PDF文档以及数据集/软件的网页中提取相关信息。我们对作者节点进行了消歧处理,以移除代表同一人物的重复节点。 本次发布的资源包含4047篇出版物、5488个数据集、22款软件、21561位作者,以及9692条连接出版物与数据集/软件的边。该图谱存放在curated_MES文件夹中。本资源以如下形式提供: 1. 属性图:提供可直接导入Neo4j图数据库的备份文件; 2. 5个"JSON Lines (jsonl)"格式文件,分别对应出版物、数据集、软件、作者与关系数据。每个jsonl文件的每一行均为一个代表节点(或关系)的JSON对象,包含该节点(或关系)的元数据。 我们还额外提供两份学术图谱: 第一份为带移除边的精选MES图谱。在精选处理流程中,我们移除了部分语义标签不一致或不准确的边。该图谱与前述精选图谱拥有完全一致的节点与边集合,此外还包含了精选流程中被移除的边——这些边被标记为「已移除」。该图谱存放在curated_MES_with_removed_semantics文件夹中。 第二份为OpenAIRE原始MES共同体图谱。该图谱对应从OpenAIRE研究图谱中提取的MES科学共同体,未经过任何精选处理,其元数据与边语义均保留自原始OpenAIRE研究图谱。该图谱存放在original_MES_community文件夹中。
创建时间:
2022-12-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作