OpenCitations Meta RDF dataset of all bibliographic metadata and its provenance information
收藏DataCite Commons2025-02-02 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/OpenCitations_Meta_RDF_dataset_of_all_bibliographic_metadata_and_its_provenance_information/21747536
下载链接
链接失效反馈官方服务:
资源简介:
Compared to the previous version, this release includes metadata related to citing and cited bibliographic resources added in the November 2024 version of Crossref, as well as the November 2024 dump of JaLC (Japan Link Center).In this version, we have focused on correcting a specific type of error, namely the erroneous duplication of resources with the same identifier. We have successfully merged:100% of duplicated identifiers (datacite:Identifier)100% of duplicated responsible agents (foaf:Agent)70% of duplicated bibliographic resources (fabio:Expression)This dataset contains all the bibliographic metadata and its provenance information (in JSON-LD format) included in OpenCitations Meta. The data and the provenance are organized through a complex structure of folders and subfolders, allowing you to quickly find any entity from its URI. The first level consists of the following folders, provided compressed and separately:<br><b>[folder "ar"]</b>: contains the data and provenance of the responsible agent type entities (http://purl.org/spar/pro/RoleInTime);<b>[folder "br"]</b>: contains the data and provenance of the entities of type bibliographic resource (http:///purl.org/spar/fabio/Expression);<b>[folder "id"]</b>: contains the data and provenance of the identifier entities (http://purl.org/spar/datacite/Identifier);<b>[folder "ra"]</b>: contains the data and provenance of the responsible agent type entities (http://xmlns.com/foaf/0.1/Agent);<b>[folder "re"]</b>: contains the data and provenance of resource embodiment entities (http://purl.org/spar/fabio/Manifestation).The inner folders are named through the <b>supplier prefix</b> of the contained entities. It is a prefix that allows you to recognize the entity membership index (e.g., OpenCitations Meta corresponds to <b>06*0</b>).After that, the folders have <b>numeric names</b>, which refer to the range of contained entities. For example, the 10000 folder contains entities from 1 to 10000. Inside, you can find the <b>zipped </b>RDF data.At the same level, additional folders containing the <b>provenance </b>are named with the same criteria already seen. Then, the 1000 folder includes the provenance of the entities from 1 to 1000. The provenance is located inside a folder called <b>prov</b>, also in zipped JSON-LD format.For example, data related to the entity is located in the folder /br/06250/10000/1000/1000.zip, while information about provenance in /br/06250/10000/1000/prov/1000.zipThis version of the dataset contains:121,302,680 bibliographic entities368,061,399 authors, 2,718,222 editors, and 101,612,475 publishers (counted by their roles, without disambiguating individual698,995 publication venuesThe compressed archives total 47GB, using the tar.gz compression algorithm, and expand to 145G when decompressed. The JSON-LD files inside the archives are further compressed using the zip algorithm. It is recommended to process these inner files as compressed without extracting them, to manage data more efficiently.Additional information about OpenCitations Meta at the official webpage.<br>
提供机构:
figshare
创建时间:
2022-12-18



