Core bibliometric Covid19 and comparable research dataset and code for the study "From intent to impact: Investigating the effects of open sharing commitments"
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6582758
下载链接
链接失效反馈官方服务:
资源简介:
This document provides the underlying dataset for the bibliometric component for the 2022 study "From intent to impact: Investigating the effects of open sharing commitments" by Research Consulting and Science-Metrix.
Before reproducing the study findings or re-using the underlying datasets for other purposes, please cautiously review their limitations in the study's technical annex and main report, available at: https://zenodo.org/communities/data-sharing-in-public-health-emergencies/
Particularly, note that there is an error rate in attribution of signatory status to journal publications and preprints; in their location within specific thematic disease-based areas; or computing of dimension such as identification of data availability statement sections; identification of data depisition mentions within data availability statement sections; or matching of preprints and journal publications.
These error rates are expected and have been estimated, please consult the technical report for full details.
Definition of data fields is provided is the table below:
Column name
Definition
document_type
preprint or journal publication
doi
digital object identifier
arxiv_id
arXiv preprint server's unique identifier for its preprints
ssrn_id
SSRN preprint server's unique identifier for its preprints. Note that some of these IDs are contained within the DOIs also assigned to some (but not all) SSRN preprints , in the form of "10.2139/ssrn." + 'ssrn_id'
coalesce_id
coalesce function applied to the DOI, arxiv_id and ssrn_id. Redundant for journal publications.
preprint_server
Preprint platform on which a preprint has been published, restricted to arXiv, bioRxiv, medRxiv and SSRN for this study.
journal_title
Publishing journal name in the case of a journal publication.
year
The set is restricted to 2020 and 2021 for Covid19 preprints and journal publications. HVRD journal publications restricted to 2018-2019. HVRD preprints were restricted to 2020-2021 instead, to compensate for the lac of year-normalization for preprints, and generally better control findings against the launch of medRxiv in 2019.
publication_title
Title of the individual journal publication or preprint, not that of the publishing journal or preprint server.
authors
First 100 researchers that appear as authors of a preprint or journal publication. These are not parsed and provided for qualitative validation or assessments rather than for further quantitative treatment.
Covid19
Journal publications or preprints are coded 1 if they has been identified as falling into this thematic area through our queries (see the technical annex), 0 otherwise
HVRD
Human viral respiratory disease, the thematic area considered to be the closest to Covid19. Journal publications or preprints are coded 1 if they has been identified as falling into this thematic area through our queries (see the technical annex), 0 otherwise
Journal_sig
Journal publications where the publishing journal and/or its publishing house are Joint Statement signatories. Coded as 1 if they are signatories, 0 if not signatory, null if status could not be determined due to insufficient metadata. Not that all preprint servers included in this study are Joint Statement signatories. This category was fully removed from the models for preprints, rather than all preprints being assigned automatic signatory status.
RPO_sig
Journal publications and preprints where at least one author is affiliated with at least one research performing organization that is a Joint Statement signatory. Coded as 1 ifor signatory, 0 if not signatory, null if status could not be determined due to insufficient metadata.
Funder_sig
Journal publications and preprints where at least one funder supporting the research is a Joint Statement signatory. Coded as 1 ifor signatory, 0 if not signatory, null if status could not be determined due to insufficient metadata. Although funding is attributed to researchers rather than publications, funding metadata is more readily available at the second level. This approach also captures the flexible usage of financial resources that researchers may make accross mulitple concurrently ongoing research projects.
overton_norm
Year and subfield-normalized binary score of whether the journal publications has been cited by one or more policy-related documents from the Overton database. Null scores for journal publications not covered by the database.
overton
Normalizations being unable for preprints, binary score of whether the preprint has been cited by one or more policy-ralated documents from the Overton database. Null scores for preprints not covered by the database.
daswriting_binary
Binary score capturing identification of a data availability statement in the journal publication or preprint using the queries presented in the technical annex. Null scores are for publications and preprints where records of full texts were unavailable for text mining, or were this analysis could not be performed due to licensing restrictions.
deposition_binary
Binary score capturing identification of a data availability statement and data deposition mention therein in the journal publication or preprint using the queries presented in the technical annex. Null scores are for publications and preprints where records of full texts were unavailable for text mining, or were this analysis could not be performed due to licensing restrictions.
is_oa
Binary score capturing OA or free-to-read (also so-calleod "bronze OA" and "green OA") status of journal publications. Unpaywall categories have been used in a mutually exclusive implementation, with the best (gold > hybrid>bronze>green) possible applicable category being retained. Null scores for journal publications not covered in our Unpaywall dataset. Scores of 0 denote journal publications not available under an OA or free-to-read category.
is_gold
as above
is_hybrid
as above
is_bronze
as above
is_green
as above
matched_journal_binary
For preprints, whether one or more matching journal publications could be identified using the queries identified in the technical, or preprint servers' own lists of preprint-journal publication matches. Null scores for preprints with insufficient metadata information to perform the matching operation.
matched_journal_doi
For those preprints with or more matching journal publications, the DOI(s) of the matching journal publication(s). Note that some of the maching journal publications identified do not have DOIs.
matched_preprint_binary
For journal publications, whether one or more matching preceding preprints could be identified using the queries identified in the technical annex, or preprint servers' own lists of preprint-journal publication matches. Null scores for journal publications without sufficient metadata to run the analysis.
matched_preprint_id
For those journal publications preceded with one or more arXiv, bioRxiv, medRxiv or SSRN preprints, the DOI(s), arXiv ID and/or SSRN ID of the matching preprint(s).
hasdoi
Only journal publications with DOIs were retained in the core quantitative analyses.
hasacknowledgements
Only journal publications with funding acknowledgements (to determine funding-based signatory status) were retained in the core quantitative analyses.
funder_array
Array (but cast as string) of names of the funders on the basis of whose idenitification signatory status has been attributed, where relevant. Null if non-signatory or unknown signatory status.
RPO_array
Array (but cast as string) of names of the research performing organizations on the basis of whose idenitification signatory status has been attributed, where relevant. Null if non-signatory or unknown signatory status.
DAS_excerpt
Journal publication or preprint text excerpt on which succesful identifcation of data availability statements and/or data deposition mentions have been made. Null both where the query could not be run at all, or where the query was negative.
big5
Journal publication published in a journal owned by one of the following five publishing houses: Elsevier, Sage, Springer Nature, Taylor-Francis, Wiley.
LMIC
Journal publication whose authors include at least one researcher affiliated with at least one institution located in a lower-middle income country as defined by the World Bank
LIC
Journal publication whose authors include at least one researcher affiliated with at least one institution located in a low income country as defined by the World Bank
SouthNorth
Journal publication whose authors include at least one researcher affiliated with at least one institution located in a upper-middle income country, a lower-middle income country, or a low income country as defined by the World Bank; as well as at least one researcher affiliated with at least one institution located in a high income country. For the purpose of this indicator, Sicnece-Metrix exceptionally includes China and Bulgaria in the list of high income countries.
DID_allauthors_OR
Journal publication is included in the difference-in-difference model defining signatory publication as EITHER holding journal-based signatory status OR funding-based signatory status, and where no filter has been applied to control for author-level biases.
DID_authorcontrol_OR
Journal publication is included in the difference-in-difference model defining signatory publication as EITHER holding journal-based signatory status OR funding-based signatory status, and where a filter has been applied to control for author-level biases.
DID_authorcontrol_AND
Journal publication is included in the difference-in-difference model defining signatory publication as holding journal-based signatory status AND funding-based signatory status, and where a filter has been applied to control for author-level biases.
DID_allauthors_AND
Journal publication is included in the difference-in-difference model defining signatory publication as holding journal-based signatory status AND funding-based signatory status, and where no filter has been applied to control for author-level biases.
Preprint_authorcontrol
Preprint is included in the the analytical breakdowns where a filter has been applied to control for author-level biases. Note that authors have been kept constant in preprints on the basis of their belonging to all analytical breakdowns in journal publications rather than in preprint-based groups.
创建时间:
2022-06-16



