five

Diversity in citations to a single study: Supplementary data set for citation context network analysis

收藏
Mendeley Data2024-03-27 更新2024-06-27 收录
下载链接:
https://zenodo.org/record/5244800
下载链接
链接失效反馈
官方服务:
资源简介:
Introduction This document describes the data set used for all analyses in 'Diversity in citations to a single study: A citation context network analysis of how evidence from a prospective cohort study was cited' accepted for publication in Quantitative Science Studies [1]. Data Collection The data collection procedure has been fully described [1]. Concisely, the data set contains bibliometric data collected from Web of Science Core Collection via the University of Edinburgh’s Library subscription concerning all papers that cited a cohort study, Paul et al. [2], in the period <1985. This includes a full list of citing papers, and the citations between these papers. Additionally, it includes textual passages (citation contexts) from 343 citing papers, which were manually recovered from the full-text documents accessible via the University of Edinburgh’s Library subscription. These data have been cleaned, converted into network readable datasets, and are coded into particular classifications reflecting content, which are described fully in the supplied code book and within the manuscript [1]. Data description All relevant data can be found in the attached file 'Supplementary_material_Leng_QSS_2021.xlsx', which contains the following five workbooks: “Overview” includes a list of the content of the workbooks. “Code Book” contains the coding rules and definitions used for the classification of findings and paper titles. “Node attribute list” includes a workbook containing all node attributes for the citation network, which includes Paul et al. [2] and its citing papers as of 1984. Highlighted in yellow at the bottom of this workbook is two papers that were discarded due to duplication - remove these if analysing this dataset in a network analysis. The columns refer to: Id, the node identifier Label, the formal citation of the paper to which data within this row corresponds. Citation is in the following format: last name of first author, year of publication, journal of publication, volume number, start page, and DOI (if available). Title, the paper title for the paper in question. Publication_year, the year of publication. Document_type, the document type (e.g. review, article) WoS_ID, the paper’s unique Web of Science accession number. Citation_context, a column specifying whether citation context data is available from that paper Explanans, the title explanans terms for that paper; Explanandum, the explanandum terms for that paper. Combined_Title_Classification, the combined terms used for fig 2 of the published manuscript. Serum_cholesterol_(SC), a column identifying papers that cited the serum cholesterol findings. Blood_Pressure_(BP), a column identifying papers that cited the blood pressure findings. Coffee_(C), a column identifying papers that cited the coffee findings. Diet_(D), a column identifying papers that cited the dietary findings. Smoking_(S), a column identifying papers that cited the smoking findings. Alcohol_(A), a column identifying papers that cited the alcohol findings. Physical_Activity_(PA), a column identifying papers that cited the physical activity findings. Body_Fatness (BF), a column identifying papers that cited the body fatness findings. Indegree, the number of within network citations to that paper, calculated for the network shown in Fig 4 of the manuscript. Outdegree, the number of within network references of that paper as calculated for the network in Fig 4. Main_component, a column specifying whether a node is contained in the largest weakly connect component as shown in Fig 4 of the manuscript. Cluster, provides the cluster membership number as discussed within the manuscript (Fig 5). “Edge list” includes a workbook including the edges for the network. The columns refer to: Source, contains the node identifier of the citing paper. Target, contains the node identifier of the cited paper. “Citation context classification” includes a workbook containing the WoS accession number for the paper analysed, and any finding category discussed in that paper established via context analysis (see the code book for definitions). The columns refer to: Id, the node identifier Finding_Class, the findings discussed from Paul et al. within the body of the citing paper. “Citation context data” includes a workbook containing the WoS accession number for papers in which citation context data was available, the citation context passages, the reference number or format of Paul et al. within the citing paper, and the finding categories discussed in those contexts (see code book for definitions). The columns refer to: Id, the node identifier Citation_context, the passage copied from the full text of the citing paper containing discussion of the findings of Paul et al. Reference_in_citing_article, the reference number or format of Paul et al. within the citing paper. Finding_class, the findings discussed from Paul et al. within the body of the citing paper. Software recommended for analysis For the analyses performed within the manuscript, Gephi version 0.9.2 was used [3], and both the edge and node lists are in a format that is easily read into this software. The Sci2 tool was used to parse data initially [4]. Notes Leng, R. I. (Forthcoming). Diversity in citations to a single study: A citation context network analysis of how evidence from a prospective cohort study was cited. Quantitative Science Studies. Paul, O., Lepper, M. H., Phelan, W. H., Dupertuis, G. W., Macmillan, A., McKean, H., et al. (1963). A longitudinal study of coronary heart disease. Circulation, 28, 20-31. https://doi.org/10.1161/01.cir.28.1.20. Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media. Sci2 Team. (2009). Science of Science (Sci2) Tool. Indiana University and SciTech Strategies. Stable URL: https://sci2.cns.iu.edu

引言 本文档描述了发表于《定量科学研究》(Quantitative Science Studies)且已被录用的论文《单篇研究的引用多样性:一项关于前瞻性队列研究证据引用方式的引用语境网络分析》[1]中所有分析所用的数据集。 数据采集 数据采集流程已在文献[1]中完整阐述。简言之,本数据集包含通过爱丁堡大学图书馆订阅权限,从Web of Science核心合集(Web of Science Core Collection)中采集的所有引用了队列研究Paul等[2]的文献计量数据(bibliometric data),采集时段为1985年之前。数据集涵盖所有被引文献的完整列表,以及这些文献之间的引用关系。此外,还包含从343篇被引文献的全文中手动提取的文本段落(引用语境(citation context)),这些全文可通过爱丁堡大学图书馆订阅权限获取。本数据集已完成清洗、转换为适用于网络分析的可读格式,并根据内容进行了分类编码,详细分类规则见配套代码手册与论文原文[1]。 数据说明 所有相关数据均可在附件文件《Supplementary_material_Leng_QSS_2021.xlsx》中找到,该文件包含以下五个工作表: 1. 总览(Overview):列出各工作表的内容概要。 2. 代码手册(Code Book):包含用于研究发现与论文标题分类的编码规则与定义。 3. 节点属性列表(Node attribute list):包含引用网络所有节点属性的工作表,涵盖截至1984年的Paul等[2]及其所有被引文献。该工作表底部黄色高亮标注了两篇因重复而需剔除的文献——若进行网络分析,请移除这两篇文献。各列含义如下: - Id:节点标识符 - Label:当前行对应文献的正式引用格式,引用格式为:第一作者姓氏、出版年份、发表期刊、卷号、起始页码,以及DOI(若可用) - Title:对应文献的标题 - Publication_year:出版年份 - Document_type:文献类型(例如综述、研究论文) - WoS_ID:该文献唯一的Web of Science收录编号 - Citation_context:标注该文献是否包含引用语境数据 - Explanans:该文献的解释项(Explanans)术语 - Explanandum:该文献的被解释项(Explanandum)术语 - Combined_Title_Classification:用于已发表论文图2的组合术语 - Serum_cholesterol_(SC):标注引用了血清胆固醇研究发现的文献 - Blood_Pressure_(BP):标注引用了血压研究发现的文献 - Coffee_(C):标注引用了咖啡相关研究发现的文献 - Diet_(D):标注引用了膳食相关研究发现的文献 - Smoking_(S):标注引用了吸烟相关研究发现的文献 - Alcohol_(A):标注引用了酒精相关研究发现的文献 - Physical_Activity_(PA):标注引用了身体活动相关研究发现的文献 - Body_Fatness (BF):标注引用了体脂相关研究发现的文献 - Indegree:该节点在网络内的入度,即该节点被网络内其他文献引用的次数,对应论文图4中的网络计算结果 - Outdegree:该节点在网络内的出度,即该节点引用网络内其他文献的次数,对应论文图4中的网络计算结果 - Main_component:标注该节点是否属于论文图4中所示的最大弱连通分量(weakly connect component) - Cluster:提供论文中提及的聚类成员编号(对应图5) 4. 边列表(Edge list):包含网络边数据的工作表,各列含义如下: - Source:引用文献的节点标识符 - Target:被引文献的节点标识符 5. 引用语境分类(Citation context classification):包含分析文献的WoS收录编号,以及通过语境分析确定的该文献所讨论的研究发现类别的工作表(定义详见代码手册)。各列含义如下: - Id:节点标识符 - Finding_Class:被引文献Paul等在引用文献正文中讨论的研究发现类别 6. 引用语境数据(Citation context data):包含存在引用语境数据的文献的WoS收录编号、引用语境段落、该引用文献中Paul等文献的参考文献编号或格式,以及这些语境中讨论的研究发现类别的工作表(定义详见代码手册)。各列含义如下: - Id:节点标识符 - Citation_context:从引用文献全文中复制的包含Paul等研究发现讨论内容的段落 - Reference_in_citing_article:该引用文献中Paul等文献的参考文献编号或格式 - Finding_class:被引文献Paul等在引用文献正文中讨论的研究发现类别 分析推荐软件 论文中所用分析采用Gephi 0.9.2版本[3],边列表与节点列表格式均可直接导入该软件。初始数据解析采用Sci2工具[4]。 附注 Leng, R. I. (即将发表). 单篇研究的引用多样性:一项关于前瞻性队列研究证据引用方式的引用语境网络分析. 定量科学研究. Paul, O., Lepper, M. H., Phelan, W. H., Dupertuis, G. W., Macmillan, A., McKean, H., et al. (1963). 冠心病的纵向研究. 循环, 28, 20-31. https://doi.org/10.1161/01.cir.28.1.20. Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi:一款用于探索与操作网络的开源软件. AAAI国际博客与社交媒体会议. Sci2团队. (2009). 科学之科学(Sci2)工具. 印第安纳大学与SciTech策略研究所. 稳定链接:https://sci2.cns.iu.edu
创建时间:
2023-06-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作