DOIBoost Dataset Dump
收藏Zenodo2019-12-02 更新2026-04-07 收录
下载链接:
https://zenodo.org/record/3559699
下载链接
链接失效反馈官方服务:
资源简介:
Research in information science and scholarly communication strongly relies on the availability of openly accessible datasets of metadata and, where possible, their relative payloads. To this end, CrossRef plays a pivotal role by providing free access to its entire metadata collection, and allowing other initiatives to link and enrich its information. Therefore, a number of key pieces of information result scattered across diverse datasets and resources freely available online. As a result of this fragmentation, researchers in this domain end up struggling with daily integration problems producing a plethora of ad-hoc datasets, therefore incurring in a waste of time, resources, and infringing open science best practices. The latest DOIBoost release is a metadata collection that enriches CrossRef (October 2019 release: 108,048,986 publication records) with inputs from Microsoft Academic Graph (October 2019 release: 76,171,072 publication records), ORCID (October 2019 release: 12,642,131 publication records), and Unpaywall (August 2019 release: 26,589,869 publication records) for the purpose of supporting high-quality and robust research experiments. As a result of DOIBoost, CrossRef records have been "boosted" as follows: 47,254,618 CrossRef records have been enriched with an abstract from MAG; 33,279,428 CrossRef records have been enriched with an affiliation from MAG and/or ORCID; 509,588 CrossRef records have been enriched with an ORCID identifier from ORCID. This entry consists of two files: <strong>doiboost_dump-2019-11-27.tar </strong>(contains a set of <strong>partXYZ.gz</strong> files, each one containing the JSON files relative to the enriched CrossRef records), a <strong>schemaAndSample.zip</strong>, and <strong>termsOfUse.doc </strong>(contains details on the terms of use of DOIBoost). Note that this records comes with two relationships to other results of this experiment: link to the data paper: for more information on how the dataset is (and can be) generated; link to the software: to repeat the experiment
提供机构:
Institute of Information Science and Technology - CNR; Knowledge Media Institute - Open University
创建时间:
2019-12-02



