CORD-19 Software Mentions
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.vmcvdncs0
下载链接
链接失效反馈官方服务:
资源简介:
In an effort to automate the process of identifying and analyzing the use of software in biomedical research, we have developed a SciBERT-based machine learning model to extract mentions of software from scientific articles. The input to this model is the full text from a scientific article and the output is a list of mentioned software within it. We applied this model to the CORD-19 full-text articles and stored the output in this dataset, which includes metadata of over 77,000 COVID-19 and coronavirus-related papers and a list of software tools mentioned in each.
Methods
We have developed a machine learning model to extract mentions of software from scientific articles. The SoftCite dataset was used to train and evaluate the model. This model has been applied to the CORD-19 collection of full-text coronavirus-related research papers. This dataset comprises the output of this model and each scientific article's relevant metadata.
Data are derived from the CORD-19 dataset provided by AllenAI, release version 2021-02-08 (changelog cord-19_2021-02-08.tar.gz 7.4GB c5446fea 29f69de2) downloaded from AWS on 08-Feb-2021.
创建时间:
2021-03-05



