iCite Database Snapshot 2025-11
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/iCite_Database_Snapshot_2025-10/30767501
下载链接
链接失效反馈官方服务:
资源简介:
This is a database snapshot of the iCite web service (provided here as a single zipped CSV file, or compressed, tarred JSON files). In addition, citation links in the NIH Open Citation Collection are provided as a two-column CSV table in open_citation_collection.zip. iCite provides bibliometrics and metadata on publications indexed in PubMed, organized into three modules:
Influence: Delivers metrics of scientific influence, field-adjusted and benchmarked to NIH publications as the baseline.
Translation: Measures how Human, Animal, or Molecular/Cellular Biology-oriented each paper is; tracks and predicts citation by clinical articles
Open Cites: Disseminates link-level, public-domain citation data from the NIH Open Citation Collection
Definitions for individual data fields:
pmid: PubMed Identifier, an article ID as assigned in PubMed by the National Library of Medicine
doi: Digital Object Identifier, if available
year: Year the article was published
title: Title of the article
authors: List of author names
journal: Journal name (ISO abbreviation)
is_research_article: Flag indicating whether the Publication Type tags for this article are consistent with that of a primary research article
relative_citation_ratio: Relative Citation Ratio (RCR)--OPA's metric of scientific influence. Field-adjusted, time-adjusted and benchmarked against NIH-funded papers. The median RCR for NIH funded papers in any field is 1.0. An RCR of 2.0 means a paper is receiving twice as many citations per year than the median NIH funded paper in its field and year, while an RCR of 0.5 means that it is receiving half as many citations per year. Calculation details are documented in Hutchins et al., PLoS Biol. 2016;14(9):e1002541.
provisional: RCRs for papers published in the previous two years are flagged as "provisional", to reflect that citation metrics for newer articles are not necessarily as stable as they are for older articles. Provisional RCRs are provided for papers published previous year, if they have received with 5 citations or more, despite being, in many cases, less than a year old. All papers published the year before the previous year receive provisional RCRs. The current year is considered to be the NIH Fiscal Year which starts in October. For example, in July 2019 (NIH Fiscal Year 2019), papers from 2018 receive provisional RCRs if they have 5 citations or more, and all papers from 2017 receive provisional RCRs. In October 2019, at the start of NIH Fiscal Year 2020, papers from 2019 receive provisional RCRs if they have 5 citations or more and all papers from 2018 receive provisional RCRs.
citation_count: Number of unique articles that have cited this one
citations_per_year: Citations per year that this article has received since its publication. If this appeared as a preprint and a published article, the year from the published version is used as the primary publication date. This is the numerator for the Relative Citation Ratio.
field_citation_rate: Measure of the intrinsic citation rate of this paper's field, estimated using its co-citation network.
expected_citations_per_year: Citations per year that NIH-funded articles, with the same Field Citation Rate and published in the same year as this paper, receive. This is the denominator for the Relative Citation Ratio.
nih_percentile: Percentile rank of this paper's RCR compared to all NIH publications. For example, 95% indicates that this paper's RCR is higher than 95% of all NIH funded publications.
human: Fraction of MeSH terms that are in the Human category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)
animal: Fraction of MeSH terms that are in the Animal category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)
molecular_cellular: Fraction of MeSH terms that are in the Molecular/Cellular Biology category (out of this article's MeSH terms that fall into the Human, Animal, or Molecular/Cellular Biology categories)
x_coord: X coordinate of the article on the Triangle of Biomedicine
y_coord: Y Coordinate of the article on the Triangle of Biomedicine
is_clinical: Flag indicating that this paper meets the definition of a clinical article.
cited_by_clin: PMIDs of clinical articles that this article has been cited by.
apt: Approximate Potential to Translate is a machine learning-based estimate of the likelihood that this publication will be cited in later clinical trials or guidelines. Calculation details are documented in Hutchins et al., PLoS Biol. 2019;17(10):e3000416.
cited_by: PMIDs of articles that have cited this one.
references: PMIDs of articles in this article's reference list.
Large CSV files are zipped using zip version 4.5, which is more recent than the default unzip command line utility in some common Linux distributions. These files can be unzipped with tools that support version 4.5 or later such as 7zip.
Comments and questions can be addressed to iCite@mail.nih.gov
本数据集为iCite网页服务的数据库快照,以单个压缩CSV文件或打包压缩的JSON文件形式提供。此外,美国国立卫生研究院(NIH)开放引用集(NIH Open Citation Collection)中的引用链接以双列CSV表格的形式封装于open_citation_collection.zip中。iCite可为PubMed索引的出版物提供文献计量学指标与元数据,共分为三个模块:
影响力(Influence):提供经领域校准、以NIH出版物为基准的科学影响力计量指标。
转化性(Translation):衡量每篇论文的研究方向偏向人类、动物还是分子/细胞生物学;追踪并预测临床文章对其的引用情况。
开放引用(Open Cites):发布来自NIH开放引用集的链接级公共领域引用数据。
各单个数据字段的定义如下:
pmid:PubMed标识符(PubMed Identifier),即美国国家医学图书馆在PubMed中分配的文章唯一标识。
doi:数字对象标识符(Digital Object Identifier),若文章可用则提供该字段。
year:文章的出版年份。
title:文章标题。
authors:作者姓名列表。
journal:期刊名称(采用ISO标准缩写)。
is_research_article:标记字段,用于指示该文章的出版物类型标签是否符合原创研究文章的定义。
relative_citation_ratio:相对引用比率(Relative Citation Ratio, RCR)——美国国立卫生研究院办公室(OPA)的科学影响力计量指标,经领域、时间校准,并以NIH资助论文为基准。任意领域内NIH资助论文的RCR中位数为1.0。RCR为2.0意味着该论文每年获得的引用量是其所属领域及出版年份下NIH资助论文中位数的2倍;而RCR为0.5则表示其每年获得的引用量仅为该中位数的一半。具体计算方法详见Hutchins等人发表于*PLoS Biol*. 2016;14(9):e1002541的研究。
provisional:近两年发表的论文的RCR将被标记为“临时”,以反映较新论文的引用指标稳定性不如旧论文。若近一年发表的论文获得5次及以上引用,即便其发表时长往往不足一年,也会提供临时RCR;上上个年度发表的所有论文均会被标记临时RCR。当前年度定义为NIH财年,始于每年10月。例如,在2019年7月(NIH 2019财年),2018年发表且获得5次及以上引用的论文将被标记临时RCR,且2017年发表的所有论文均会被标记临时RCR。2019年10月,即NIH 2020财年开始时,2019年发表且获得5次及以上引用的论文将被标记临时RCR,而2018年发表的所有论文均会被标记临时RCR。
citation_count:引用该文章的唯一文献数量。
citations_per_year:该文章自出版以来每年获得的平均引用量。若文章同时以预印本和正式出版物形式发布,则以正式出版物的出版年份作为主要出版日期。该指标为相对引用比率的分子。
field_citation_rate:该论文所属领域的固有引用率指标,通过其共引网络估算得出。
expected_citations_per_year:与该论文具有相同领域引用率且同年出版的NIH资助论文每年获得的平均引用量。该指标为相对引用比率的分母。
nih_percentile:该论文的RCR在所有NIH资助出版物中的百分位排名。例如,95%表示该论文的RCR高于95%的NIH资助出版物。
human:属于“人类”类别的医学主题词(MeSH terms)占该文章所有归入“人类、动物或分子/细胞生物学”类别的医学主题词的比例。
animal:属于“动物”类别的医学主题词占该文章所有归入上述三类别的医学主题词的比例。
molecular_cellular:属于“分子/细胞生物学”类别的医学主题词占该文章所有归入上述三类别的医学主题词的比例。
x_coord:该文章在生物医学三角图上的X坐标。
y_coord:该文章在生物医学三角图上的Y坐标。
is_clinical:标记字段,用于指示该文章符合临床文章的定义。
cited_by_clin:引用该文章的临床文章的PubMed标识符(PMID)列表。
apt:近似转化潜力(Approximate Potential to Translate, APT)是基于机器学习的估算值,用于预测该出版物后续被临床试验或指南引用的可能性。具体计算方法详见Hutchins等人发表于*PLoS Biol*. 2019;17(10):e3000416的研究。
cited_by:引用该文章的所有文献的PubMed标识符(PMID)列表。
references:该文章参考文献列表中所包含的文献的PubMed标识符(PMID)列表。
大型CSV文件采用zip 4.5版本压缩,该版本较部分常见Linux发行版默认的unzip命令行工具更新。可使用支持zip 4.5及以上版本的解压工具(如7zip)解压此类文件。
如有意见或疑问,请发送邮件至iCite@mail.nih.gov。
创建时间:
2025-12-02



