five

Dataset of 286 publications citing the 2014 Willoughby-Jansma-Hoye protocol

收藏
doi.org2025-01-16 收录
下载链接:
https://doi.org/10.13012/B2IDB-4610831_V3
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset consists of the 286 publications retrieved from Web of Science and Scopus on July 6, 2023 as citations for Willoughby et al., 2014: Patrick H. Willoughby, Matthew J. Jansma, and Thomas R. Hoye (2014). A guide to small-molecule structure assignment through computation of (¹H and ¹³C) NMR chemical shifts. Nature Protocols, 9(3), Article 3. https://doi.org/10.1038/nprot.2014.042 We added the DOIs of the citing publications into a Zotero collection. Then we exported all 286 DOIs in two formats: a .csv file (data export) and an .rtf file (bibliography). <b>Willoughby2014_286citing_publications.csv</b> is a Zotero data export of the citing publications. <b>Willoughby2014_286citing_publications.rtf</b> is a bibliography of the citing publications, using a variation of the American Psychological Association style (7th edition) with full names instead of initials. To create <b>Willoughby2014_citation_contexts.csv</b>, HZ manually extracted the paragraphs that contain a citation marker of Willoughby et al., 2014. We refer to these paragraphs as the citation contexts of Willoughby et al., 2014. Manual extraction started with 286 citing publications but excluded 2 publications that are not in English, those with DOIs 10.13220/j.cnki.jipr.2015.06.004 and 10.19540/j.cnki.cjcmm.20200604.201 The silver standard aimed to triage the citing publications of Willoughby et al., 2014 that are at risk of propagating unreliability due to a code glitch in a computational chemistry protocol introduced in Willoughby et al., 2014. The silver standard was created stepwise: First one chemistry expert (YF) manually annotated the corpus of 284 citing publications in English, using their full text and citation contexts. She manually categorized publications as either at risk of propagating unreliability or not at risk of propagating unreliability, with a rationale justifying each category. Then we selected a representative sample of citation contexts to be double annotated. To do this, MJS turned the full dataset of citation contexts (Willoughby2014_citation_contexts.csv) into word embeddings, clustered them using similarity measures using BERTopic's HDBS, and selected representative citation contexts based on the centroids of the clusters. Next the second chemistry expert (EV) annotated the 77 publications associated with the citation contexts, considering the full text as well as the citation contexts. <b>double_annotated_subset_77_before_reconciliation.csv</b> provides EV and YF's annotation before reconciliation. To create the silver standard YF, EV, and JS discussed differences and reconciled most differences. YF and EV had principled reasons for disagreeing on 9 publications; to handle these, YF updated the annotations, to create the silver standard we use for evaluation in the remainder of our JCDL 2024 paper (<b>silver_standard.csv</b>) <b>Inter_Annotator_Agreement.xlsx</b> indicates publications where the two annotators made opposite decisions and calculates the inter-annotator agreement before and after reconciliation together. <b>double_annotated_subset_77_before_reconciliation.csv</b> provides EV and YF's annotation after reconciliation, including applying the reconciliation policy.

本数据集汇聚了截至2023年7月6日从Web of Science和Scopus数据库中检索到的286篇文献,这些文献作为Willoughby等人(2014年)的研究成果的引用。该研究成果由Patrick H. Willoughby、Matthew J. Jansma和Thomas R. Hoye所著,题为《通过计算(¹H和¹³C)核磁共振化学位移指导小分子结构归属指南》,发表于《Nature Protocols》杂志第9卷第3期。具体文献信息可参考:https://doi.org/10.1038/nprot.2014.042。我们将引用文献的DOI信息添加至Zotero收藏夹中,并以此导出两种格式的文件:一份为.csv文件(数据导出),另一份为.rtf文件(参考文献)。其中,<b>Willoughby2014_286citing_publications.csv</b>为Zotero数据导出的引用文献列表,而<b>Willoughby2014_286citing_publications.rtf</b>则采用美国心理学会(APA)第7版风格的变体编制,并使用全名而非首字母缩写。为了生成<b>Willoughby2014_citation_contexts.csv</b>文件,HZ手工提取了包含Willoughby等人(2014年)引用标记的段落,这些段落被称为Willoughby等人(2014年)的引用上下文。手工提取始于286篇引用文献,但排除了两篇非英语文献,即DOI为10.13220/j.cnki.jipr.2015.06.004和10.19540/j.cnki.cjcmm.20200604.201的文献。银标准旨在对可能因Willoughby等人(2014年)中计算化学协议中代码错误而传播不可靠性的Willoughby等人(2014年)的引用文献进行分类。银标准的创建分为以下步骤:首先,一位化学专家(YF)对284篇英语引用文献的全文及其引用上下文进行了手工标注,并根据其可能传播不可靠性或不会传播不可靠性的原因进行分类。随后,我们选取了具有代表性的引用上下文样本进行双重标注。为此,MJS将全文引用上下文数据集(Willoughby2014_citation_contexts.csv)转换为词嵌入,并使用BERTopic的HDBS进行聚类,根据聚类中心选择具有代表性的引用上下文。接着,第二位化学专家(EV)对与引用上下文相关的77篇文献进行了标注,考虑了全文和引用上下文。<b>double_annotated_subset_77_before_reconciliation.csv</b>提供了EV和YF在协调之前的标注。为了创建银标准,YF、EV和JS讨论了差异并协调了大多数差异。在9篇文献上,YF和EV存在原则性的分歧;为了处理这些问题,YF更新了标注,以创建我们在JCDL 2024年论文剩余部分用于评估的银标准(<b>silver_standard.csv</b>)。<b>Inter_Annotator_Agreement.xlsx</b>表明了两位标注者在协调前后的不同决定,并计算了标注者间的互评一致性。<b>double_annotated_subset_77_before_reconciliation.csv</b>提供了EV和YF在协调之后的标注,包括实施协调政策后的标注。
提供机构:
Illinois Data Bank
二维码
社区交流群
二维码
科研交流群
商业服务