five

Core bibliometric Covid19 and comparable research dataset and code for the study "From intent to impact: Investigating the effects of open sharing commitments"

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6582758
下载链接
链接失效反馈
官方服务:
资源简介:
This document provides the underlying dataset for the bibliometric component for the 2022 study "From intent to impact: Investigating the effects of open sharing commitments" by Research Consulting and Science-Metrix. Before reproducing the study findings or re-using the underlying datasets for other purposes, please cautiously review their limitations in the study's technical annex and main report, available at: https://zenodo.org/communities/data-sharing-in-public-health-emergencies/  Particularly, note that there is an error rate in attribution of signatory status to journal publications and preprints; in their location within specific thematic disease-based areas; or computing of dimension such as identification of data availability statement sections; identification of data depisition mentions within data availability statement sections; or matching of preprints and journal publications. These error rates are expected and have been estimated, please consult the technical report for full details.   Definition of data fields is provided is the table below: Column name  Definition document_type preprint or journal publication doi digital object identifier arxiv_id arXiv preprint server's unique identifier for its preprints ssrn_id SSRN preprint server's unique identifier for its preprints. Note that some of these IDs are contained within the DOIs also assigned to some (but not all) SSRN preprints , in the form of "10.2139/ssrn." + 'ssrn_id' coalesce_id coalesce function applied to the DOI, arxiv_id and ssrn_id. Redundant for journal publications. preprint_server Preprint platform on which a preprint has been published, restricted to arXiv, bioRxiv, medRxiv and SSRN for this study. journal_title Publishing journal name in the case of a journal publication. year The set is restricted to 2020 and 2021 for Covid19 preprints and journal publications. HVRD journal publications restricted to 2018-2019. HVRD preprints were restricted to 2020-2021 instead, to compensate for the lac of year-normalization for preprints, and generally better control findings against the launch of medRxiv in 2019. publication_title Title of the individual journal publication or preprint, not that of the publishing journal or preprint server. authors First 100 researchers that appear as authors of a preprint or journal publication. These are not parsed and provided for qualitative validation or  assessments rather than for further quantitative treatment. Covid19 Journal publications or preprints are coded 1 if they has been identified as falling into this thematic area through our queries (see the technical annex), 0 otherwise HVRD Human viral respiratory disease, the thematic area considered to be the closest to Covid19. Journal publications or preprints are coded 1 if they has been identified as falling into this thematic area through our queries (see the technical annex), 0 otherwise Journal_sig Journal publications where the publishing journal and/or its publishing house are Joint Statement signatories. Coded as 1 if they are signatories, 0 if not signatory, null if status could not be determined due to insufficient metadata. Not that all preprint servers included in this study are Joint Statement signatories. This category was fully removed from the models for preprints, rather than all preprints being assigned automatic signatory status. RPO_sig Journal publications and preprints where at least one author is affiliated with at least one research performing organization that is a Joint Statement signatory. Coded as 1 ifor signatory, 0 if not signatory, null if status could not be determined due to insufficient metadata. Funder_sig Journal publications and preprints where at least one funder supporting the research is a Joint Statement signatory. Coded as 1 ifor signatory, 0 if not signatory, null if status could not be determined due to insufficient metadata. Although funding is attributed to researchers rather than publications, funding metadata is more readily available at the second level. This approach also captures the flexible usage of financial resources that researchers may make accross mulitple concurrently ongoing research projects. overton_norm Year and subfield-normalized binary score of whether the journal publications has been cited by one or more policy-related documents from the Overton database. Null scores for journal publications not covered by the database. overton Normalizations being unable for preprints, binary score of whether the preprint has been cited by one or more policy-ralated documents from the Overton database. Null scores for preprints not covered by the database. daswriting_binary Binary score capturing identification of a data availability statement in the journal publication or preprint using the queries presented in the technical annex. Null scores are for publications and preprints where records of full texts were unavailable for text mining, or were this analysis could not be performed due to licensing restrictions.  deposition_binary Binary score capturing identification of a data availability statement and data deposition mention therein in the journal publication or preprint using the queries presented in the technical annex. Null scores are for publications and preprints where records of full texts were unavailable for text mining, or were this analysis could not be performed due to licensing restrictions.  is_oa Binary score capturing OA or free-to-read (also so-calleod "bronze OA" and "green OA") status of journal publications. Unpaywall categories have been used in a mutually exclusive implementation, with the best (gold > hybrid>bronze>green) possible applicable category being retained. Null scores for journal publications not covered in our Unpaywall dataset. Scores of 0 denote journal publications not available under an OA or free-to-read category. is_gold as above is_hybrid as above is_bronze as above is_green as above matched_journal_binary For preprints, whether one or more matching journal publications could be identified using the queries identified in the technical, or preprint servers' own lists of preprint-journal publication matches. Null scores for preprints with insufficient metadata information to perform the matching operation. matched_journal_doi For those preprints with or more matching journal publications, the DOI(s) of the matching journal publication(s). Note that some of the maching journal publications identified do not have DOIs. matched_preprint_binary For journal publications, whether one or more matching preceding preprints could be identified using the queries identified in the technical annex, or preprint servers' own lists of preprint-journal publication matches. Null scores for journal publications without sufficient metadata to run the analysis. matched_preprint_id For those journal publications preceded with one or more arXiv, bioRxiv, medRxiv or SSRN preprints, the DOI(s), arXiv ID and/or SSRN ID of the matching preprint(s).  hasdoi Only journal publications with DOIs were retained in the core quantitative analyses. hasacknowledgements Only journal publications with funding acknowledgements (to determine funding-based signatory status) were retained in the core quantitative analyses. funder_array Array (but cast as string) of names of the funders on the basis of whose idenitification signatory status has been attributed, where relevant. Null if non-signatory or unknown signatory status. RPO_array Array (but cast as string) of names of the research performing organizations on the basis of whose idenitification signatory status has been attributed, where relevant. Null if non-signatory or unknown signatory status. DAS_excerpt Journal publication or preprint text excerpt on which succesful identifcation of data availability statements and/or data deposition mentions have been made. Null both where the query could not be run at all, or where the query was negative. big5 Journal publication published in a journal owned by one of the following five publishing houses: Elsevier, Sage, Springer Nature, Taylor-Francis, Wiley. LMIC Journal publication whose authors include at least one researcher affiliated with at least one institution located in a lower-middle income country as defined by the World Bank LIC Journal publication whose authors include at least one researcher affiliated with at least one institution located in a low income country as defined by the World Bank SouthNorth Journal publication whose authors include at least one researcher affiliated with at least one institution located in a upper-middle income country, a lower-middle income country, or a low income country as defined by the World Bank; as well as at least one researcher affiliated with at least one institution located in a high income country. For the purpose of this indicator, Sicnece-Metrix exceptionally includes China and Bulgaria in the list of high income countries. DID_allauthors_OR Journal publication is included in the difference-in-difference model defining signatory publication as EITHER holding journal-based signatory status OR funding-based signatory status, and where no filter has been applied to control for author-level biases. DID_authorcontrol_OR Journal publication is included in the difference-in-difference model defining signatory publication as EITHER holding journal-based signatory status OR funding-based signatory status, and where a filter has been applied to control for author-level biases. DID_authorcontrol_AND Journal publication is included in the difference-in-difference model defining signatory publication as holding journal-based signatory status AND funding-based signatory status, and where a filter has been applied to control for author-level biases. DID_allauthors_AND Journal publication is included in the difference-in-difference model defining signatory publication as holding journal-based signatory status AND funding-based signatory status, and where no filter has been applied to control for author-level biases. Preprint_authorcontrol Preprint is included in the the analytical breakdowns where a filter has been applied to control for author-level biases. Note that authors have been kept constant in preprints on the basis of their belonging to all analytical breakdowns in journal publications rather than in preprint-based groups.
创建时间:
2022-06-16
二维码
社区交流群
二维码
科研交流群
商业服务