five

Citation network dataset covering the work of RP Millar and its citing literature

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11534256
下载链接
链接失效反馈
官方服务:
资源简介:
The following describes the citation network datasets that underpins the manuscript “A career in numbers: a citation network analysis of the work of RP Millar and his contribution to GnRH research” [1]. Data collection We retrieved data from the Web of Science Core Collection under the University of Edinburgh’s subscription in January 2024. We sought to retrieve all indexed papers of Professor Robert P. Millar (RPM). We searched the following AU = (Millar, R), and then retrieved records that corresponded to his WoS profile (n=428 records) and an additional 49 paper that were authored by Robert but had not been included in his WoS record – validating the records against a CV of his published works. We retrieved the full citation history as record by Web of Science to these papers from other indexed records. The 477 RPM papers had been cited 21,677 times by 11,138 documents by date of retrieval, and removing self-citations left 19,256 citations by 10,719 documents. We then retrieved all metadata from WoS concerning the 477 RPM papers and the 10,719 citation papers, resulting in a dataset covering 11,196 documents. Citation network dataset We constructed a citation network dataset by parsing data from each paper’s full bibliography consisting of: i. ‘Edge-list’ that records citation links from a citing to a cited document. This is constructed by assigning unique IDs to each retrieved paper and to every unique reference string contained in their bibliographies. The edge list is composed of a ‘Source’ column that contains the ID of the citing document and a ‘Target’ column containing the IDs of its citations, with one record per row. Given that we were only interested in citations between the WoS retrieved documents, we discarded any reference string that represented a document outwith our search. ii. ‘Node-attribute list’ that contains the ID, with relevant metadata contained in adjacent columns to identify documents, including authors, title of publication, journal, year of publication. We also parsed into this dataset the WoS full citation count for each paper and the total number of references in the bibliographies of each paper. This results in a dataset containing 11,196 nodes and 115,834 edges between nodes. We removed a total of 67 papers for which metadata was incomplete and/or corrupted. We further focussed on the largest interconnected component, removing nodes with no connections (isolates) or smaller components that were detached from the main network. We excluded papers <10 references to remove meeting abstracts and other minor journal items, and papers not published in English. This resulted in a final dataset containing 10,901 nodes and 113,742 edges, and it is this dataset that we share as it is the basis for the analyses within the paper. Description of dataset variables ‘RPM_Edgelist.csv’ is a comma-separate values file that consists of all 113,742 citations between the 10,901 documents of the citation network analysed in the manuscript. The columns refer to: ‘Source’, the unique identifier for the citing document ‘Target’, the unique identifier for the cited document ‘Syr’, the year of publication of the citing document ‘Tyr’, the year of publication of the cited document ‘SC’, the cluster ID of the citing document ‘TC’, the cluster ID of the cited document ‘RPM_Nodelist.csv’ is a comma-separate values file that consists of the 10,901 documents of the citation network analysed in the manuscript. The columns refer to: ‘Id’, the unique ID assigned to a document that corresponds with the edgelist ‘Reference string’, the reference string of the document ‘WoS ID’, the unique accession number assigned to a document by the Web of Science. These can be used to query WoS to find further data on all papers via the ‘UT= ’ field tag. ‘Authors’, all authors formatted by full last name and initials ‘# of authors’, number of authors ‘Title’, title of document ‘Publication year’, publication year of document ‘Document type’, document type defined by WoS (e.g. article, review, etc.) ‘Total references’, total number of references within a documents bibliography as recorded by WoS ‘Total WoS citations’, total number of citations recorded to a document from other documents indexed in the Web of Science ‘Indegree’, total number of within network citations (i.e. counting only citations from other papers retrieved by our query) ‘Outdegree’, total number of within network references (i.e. counting only reference to other papers retrieved by our query) ‘Degree’, total number of node connections (i.e. indegree + outdegree) ‘Class’, variable used to distinguish between RPM’s publications (‘RPM’) and the citing documents (‘CITE’) ‘Cluster’, provides the cluster membership number as discussed within the manuscript. This was established via modularity maximisation via the Leiden algorithm (Res 1; Q=0.67 | 25 clusters). References [1] Leng, R. I., Leng. G. (Under review). A career in numbers: a citation network analysis of the work of RP Millar and his contribution to GnRH research. J. Neuroendocrinol All bibliographic data included in this study are derived originally from Clarivate™ (Web of Science™) and downloaded in January 2024. © Clarivate 2024. All rights reserved.
创建时间:
2024-06-09
二维码
社区交流群
二维码
科研交流群
商业服务