Dataset for "Continued use of retracted papers: Temporal trends in citations and (lack of) awareness of retractions shown in citation contexts in biomedicine"
收藏doi.org2025-01-16 收录
下载链接:
https://doi.org/10.13012/B2IDB-8255619_V2
下载链接
链接失效反馈官方服务:
资源简介:
This dataset includes five files. Descriptions of the files are given as follows: <b>FILENAME: PubMed_retracted_publication_full_v3.tsv</b> - Bibliographic data of retracted papers indexed in PubMed (retrieved on August 20, 2020, searched with the query "retracted publication" [PT] ). - Except for the information in the "cited_by" column, all the data is from PubMed. - PMIDs in the "cited_by" column that meet either of the two conditions below have been excluded from analyses: [1] PMIDs of the citing papers are from retraction notices (i.e., those in the “retraction_notice_PMID.csv” file). [2] Citing paper and the cited retracted paper have the same PMID. ROW EXPLANATIONS - Each row is a retracted paper. There are 7,813 retracted papers. COLUMN HEADER EXPLANATIONS 1) PMID - PubMed ID 2) Title - Paper title 3) Authors - Author names 4) Citation - Bibliographic information of the paper 5) First Author - First author's name 6) Journal/Book - Publication name 7) Publication Year 8) Create Date - The date the record was added to the PubMed database 9) PMCID - PubMed Central ID (if applicable, otherwise blank) 10) NIHMS ID - NIH Manuscript Submission ID (if applicable, otherwise blank) 11) DOI - Digital object identifier (if applicable, otherwise blank) 12) retracted_in - Information of retraction notice (given by PubMed) 13) retracted_yr - Retraction year identified from "retracted_in" (if applicable, otherwise blank) 14) cited_by - PMIDs of the citing papers. (if applicable, otherwise blank) Data collected from iCite. 15) retraction_notice_pmid - PMID of the retraction notice (if applicable, otherwise blank) <b>FILENAME: PubMed_retracted_publication_CitCntxt_withYR_v3.tsv</b> - This file contains citation contexts (i.e., citing sentences) where the retracted papers were cited. The citation contexts were identified from the XML version of PubMed Central open access (PMCOA) articles. - This is part of the data from: Hsiao, T.-K., & Torvik, V. I. (manuscript in preparation). Citation contexts identified from PubMed Central open access articles: A resource for text mining and citation analysis. - Citation contexts that meet either of the two conditions below have been excluded from analyses: [1] PMIDs of the citing papers are from retraction notices (i.e., those in the “retraction_notice_PMID.csv” file). [2] Citing paper and the cited retracted paper have the same PMID. ROW EXPLANATIONS - Each row is a citation context associated with one retracted paper that's cited. - In the manuscript, we count each citation context once, even if it cites multiple retracted papers. COLUMN HEADER EXPLANATIONS 1) pmcid - PubMed Central ID of the citing paper 2) pmid - PubMed ID of the citing paper 3) year - Publication year of the citing paper 4) location - Location of the citation context (abstract = abstract, body = main text, back = supporting material, tbl_fig_caption = tables and table/figure captions) 5) IMRaD - IMRaD section of the citation context (I = Introduction, M = Methods, R = Results, D = Discussions/Conclusion, NoIMRaD = not identified) 6) sentence_id - The ID of the citation context in a given location. For location information, please see column 4. The first sentence in the location gets the ID 1, and subsequent sentences are numbered consecutively. 7) total_sentences - Total number of sentences in a given location 8) intxt_id - Identifier of a cited paper. Here, a cited paper is the retracted paper. 9) intxt_pmid - PubMed ID of a cited paper. Here, a cited paper is the retracted paper. 10) citation - The citation context 11) progression - Position of a citation context by centile within the citing paper. 12) retracted_yr - Retraction year of the retracted paper 13) post_retraction - 0 = not post-retraction citation; 1 = post-retraction citation. A post-retraction citation is a citation made after the calendar year of retraction. <b>FILENAME: 724_knowingly_post_retraction_cit.csv</b> (updated) - The 724 post-retraction citation contexts that we determined knowingly cited the 7,813 retracted papers in "PubMed_retracted_publication_full_v3.tsv". - Two citation contexts from retraction notices have been excluded from analyses. ROW EXPLANATIONS - Each row is a citation context. COLUMN HEADER EXPLANATIONS 1) pmcid - PubMed Central ID of the citing paper 2) pmid - PubMed ID of the citing paper 3) pub_type - Publication type collected from the metadata in the PMCOA XML files. 4) pub_type2 - Specific article types. Please see the manuscript for explanations. 5) year - Publication year of the citing paper 6) location - Location of the citation context (abstract = abstract, body = main text, back = supporting material, table_or_figure_caption = tables and table/figure captions) 7) intxt_id - Identifier of a cited paper. Here, a cited paper is the retracted paper. 8) intxt_pmid - PubMed ID of a cited paper. Here, a cited paper is the retracted paper. 9) citation - The citation context 10) retracted_yr - Retraction year of the retracted paper 11) cit_purpose - Purpose of citing the retracted paper. This is from human annotations. Please see the manuscript for further information about annotation. 12) longer_context - A extended version of the citation context. (if applicable, otherwise blank) Manually pulled from the full-texts in the process of annotation. <b>FILENAME: Annotation manual.pdf</b> - The manual for annotating the citation purposes in column 11) of the 724_knowingly_post_retraction_cit.tsv. <b>FILENAME: retraction_notice_PMID.csv</b> (new file added for this version) - A list of 8,346 PMIDs of retraction notices indexed in PubMed (retrieved on August 20, 2020, searched with the query "retraction of publication" [PT] ).
本数据集包含五份文件。以下是各文件的具体描述:<b>文件名:PubMed_retracted_publication_full_v3.tsv</b> - 包含PubMed数据库中撤回论文的文献数据(截至2020年8月20日检索,使用查询词“撤回出版物”进行检索[PT])。除“引用”列中的信息外,所有数据均来自PubMed。满足以下两种条件之一的“引用”列中的PMID已从分析中排除:[1] 引用论文的PMID来自撤回通知(即“retraction_notice_PMID.csv”文件中的内容)。[2] 引用论文与被引用的撤回论文具有相同的PMID。行说明 - 每行代表一篇撤回论文,共有7,813篇撤回论文。列标题说明 1) PMID - PubMed ID 2) Title - 论文标题 3) Authors - 作者姓名 4) Citation - 论文的文献信息 5) First Author - 第一作者姓名 6) Journal/Book - 发表名称 7) Publication Year - 发表年份 8) Create Date - 记录添加到PubMed数据库的日期 9) PMCID - PubMed Central ID(如适用,否则为空) 10) NIHMS ID - NIH稿件提交ID(如适用,否则为空) 11) DOI - 数字对象标识符(如适用,否则为空) 12) retracted_in - 撤回通知信息(由PubMed提供) 13) retracted_yr - 从“retracted_in”中识别的撤回年份(如适用,否则为空) 14) cited_by - 引用论文的PMID(如适用,否则为空) 15) retraction_notice_pmid - 撤回通知的PMID(如适用,否则为空)<b>文件名:PubMed_retracted_publication_CitCntxt_withYR_v3.tsv</b> - 包含被引用撤回论文的引用上下文(即引用句子),这些引用上下文来自PubMed Central开放获取(PMCOA)文章的XML版本。这是Hsiao, T.-K.与Torvik, V. I.(待发表论文)中数据的一部分:从PubMed Central开放获取文章中识别的引用上下文:用于文本挖掘和引用分析的资源。满足以下两种条件之一的引用上下文已从分析中排除:[1] 引用论文的PMID来自撤回通知(即“retraction_notice_PMID.csv”文件中的内容)。[2] 引用论文与被引用的撤回论文具有相同的PMID。行说明 - 每行代表与一篇被引用的撤回论文相关的引用上下文。在论文中,即使一个引用上下文引用了多篇撤回论文,我们也只计算一次。列标题说明 1) pmcid - 引用论文的PubMed Central ID 2) pmid - 引用论文的PubMed ID 3) year - 引用论文的发表年份 4) location - 引用上下文的位置(abstract = 摘要,body = 正文,back = 支持材料,tbl_fig_caption = 表格和图表标题) 5) IMRaD - 引用上下文的IMRaD部分(I = 引言,M = 方法,R = 结果,D = 讨论/结论,NoIMRaD = 未识别) 6) sentence_id - 在给定位置中的引用上下文ID。关于位置信息,请参阅第4列。位置中的第一句话获得ID 1,后续句子按顺序编号。 7) total_sentences - 给定位置中的句子总数 8) intxt_id - 被引用论文的标识符。在此,被引用论文是撤回论文。 9) intxt_pmid - 被引用论文的PubMed ID。在此,被引用论文是撤回论文。 10) citation - 引用上下文 11) progression - 引用上下文在引用论文中的百分位位置 12) retracted_yr - 撤回论文的撤回年份 13) post_retraction - 0 = 非撤回引用;1 = 撤回后引用。撤回后引用是指在撤回年份之后的日历年内进行的引用。<b>文件名:724_knowingly_post_retraction_cit.csv</b>(更新版) - 包含724个已知后撤回引用上下文,这些上下文明确引用了“PubMed_retracted_publication_full_v3.tsv”中的7,813篇撤回论文。已排除来自撤回通知的两个引用上下文。行说明 - 每行代表一个引用上下文。列标题说明 1) pmcid - 引用论文的PubMed Central ID 2) pmid - 引用论文的PubMed ID 3) pub_type - 从PMCOA XML文件元数据中收集的出版物类型。 4) pub_type2 - 特定文章类型。请参阅论文以获取解释。 5) year - 引用论文的发表年份 6) location - 引用上下文的位置(abstract = 摘要,body = 正文,back = 支持材料,table_or_figure_caption = 表格和图表标题) 7) intxt_id - 被引用论文的标识符。在此,被引用论文是撤回论文。 8) intxt_pmid - 被引用论文的PubMed ID。在此,被引用论文是撤回论文。 9) citation - 引用上下文 10) retracted_yr - 撤回论文的撤回年份 11) cit_purpose - 引用撤回论文的目的。这是来自人工标注的信息。请参阅论文以获取更多关于标注的信息。 12) longer_context - 引用上下文的扩展版本。(如适用,否则为空)在标注过程中从全文中手动提取。<b>文件名:Annotation manual.pdf</b> - 标注第11列引用目的的说明手册。 <b>文件名:retraction_notice_PMID.csv</b>(新增文件) - 包含PubMed中索引的8,346个撤回通知PMID的列表(截至2020年8月20日检索,使用查询词“出版物撤回”进行检索[PT])。
提供机构:
Illinois Data Bank



