Diversity in citations to a single study: Supplementary data set for citation context network analysis

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://zenodo.org/record/5244799

下载链接

链接失效反馈

官方服务：

资源简介：

Introduction This document describes the data set used for all analyses in 'Diversity in citations to a single study: A citation context network analysis of how evidence from a prospective cohort study was cited' accepted for publication in Quantitative Science Studies [1]. Data Collection The data collection procedure has been fully described [1]. Concisely, the data set contains bibliometric data collected from Web of Science Core Collection via the University of Edinburgh’s Library subscription concerning all papers that cited a cohort study, Paul et al. [2], in the period <1985. This includes a full list of citing papers, and the citations between these papers. Additionally, it includes textual passages (citation contexts) from 343 citing papers, which were manually recovered from the full-text documents accessible via the University of Edinburgh’s Library subscription. These data have been cleaned, converted into network readable datasets, and are coded into particular classifications reflecting content, which are described fully in the supplied code book and within the manuscript [1]. Data description All relevant data can be found in the attached file 'Supplementary_material_Leng_QSS_2021.xlsx', which contains the following five workbooks: “Overview” includes a list of the content of the workbooks. “Code Book” contains the coding rules and definitions used for the classification of findings and paper titles. “Node attribute list” includes a workbook containing all node attributes for the citation network, which includes Paul et al. [2] and its citing papers as of 1984. Highlighted in yellow at the bottom of this workbook is two papers that were discarded due to duplication - remove these if analysing this dataset in a network analysis. The columns refer to: Id, the node identifier Label, the formal citation of the paper to which data within this row corresponds. Citation is in the following format: last name of first author, year of publication, journal of publication, volume number, start page, and DOI (if available). Title, the paper title for the paper in question. Publication_year, the year of publication. Document_type, the document type (e.g. review, article) WoS_ID, the paper’s unique Web of Science accession number. Citation_context, a column specifying whether citation context data is available from that paper Explanans, the title explanans terms for that paper; Explanandum, the explanandum terms for that paper. Combined_Title_Classification, the combined terms used for fig 2 of the published manuscript. Serum_cholesterol_(SC), a column identifying papers that cited the serum cholesterol findings. Blood_Pressure_(BP), a column identifying papers that cited the blood pressure findings. Coffee_(C), a column identifying papers that cited the coffee findings. Diet_(D), a column identifying papers that cited the dietary findings. Smoking_(S), a column identifying papers that cited the smoking findings. Alcohol_(A), a column identifying papers that cited the alcohol findings. Physical_Activity_(PA), a column identifying papers that cited the physical activity findings. Body_Fatness (BF), a column identifying papers that cited the body fatness findings. Indegree, the number of within network citations to that paper, calculated for the network shown in Fig 4 of the manuscript. Outdegree, the number of within network references of that paper as calculated for the network in Fig 4. Main_component, a column specifying whether a node is contained in the largest weakly connect component as shown in Fig 4 of the manuscript. Cluster, provides the cluster membership number as discussed within the manuscript (Fig 5). “Edge list” includes a workbook including the edges for the network. The columns refer to: Source, contains the node identifier of the citing paper. Target, contains the node identifier of the cited paper. “Citation context classification” includes a workbook containing the WoS accession number for the paper analysed, and any finding category discussed in that paper established via context analysis (see the code book for definitions). The columns refer to: Id, the node identifier Finding_Class, the findings discussed from Paul et al. within the body of the citing paper. “Citation context data” includes a workbook containing the WoS accession number for papers in which citation context data was available, the citation context passages, the reference number or format of Paul et al. within the citing paper, and the finding categories discussed in those contexts (see code book for definitions). The columns refer to: Id, the node identifier Citation_context, the passage copied from the full text of the citing paper containing discussion of the findings of Paul et al. Reference_in_citing_article, the reference number or format of Paul et al. within the citing paper. Finding_class, the findings discussed from Paul et al. within the body of the citing paper. Software recommended for analysis For the analyses performed within the manuscript, Gephi version 0.9.2 was used [3], and both the edge and node lists are in a format that is easily read into this software. The Sci2 tool was used to parse data initially [4]. Notes Leng, R. I. (Forthcoming). Diversity in citations to a single study: A citation context network analysis of how evidence from a prospective cohort study was cited. Quantitative Science Studies. Paul, O., Lepper, M. H., Phelan, W. H., Dupertuis, G. W., Macmillan, A., McKean, H., et al. (1963). A longitudinal study of coronary heart disease. Circulation, 28, 20-31. https://doi.org/10.1161/01.cir.28.1.20. Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media. Sci2 Team. (2009). Science of Science (Sci2) Tool. Indiana University and SciTech Strategies. Stable URL: https://sci2.cns.iu.edu

创建时间：

2021-08-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集