five

CORD-19 Software Mentions

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.vmcvdncs0
下载链接
链接失效反馈
官方服务:
资源简介:
In an effort to automate the process of identifying and analyzing the use of software in biomedical research, we have developed a SciBERT-based machine learning model to extract mentions of software from scientific articles. The input to this model is the full text from a scientific article and the output is a list of mentioned software within it.  We applied this model to the CORD-19 full-text articles and stored the output in this dataset, which includes metadata of over 77,000 COVID-19 and coronavirus-related papers and a list of software tools mentioned in each. Methods We have developed a machine learning model to extract mentions of software from scientific articles. The SoftCite dataset was used to train and evaluate the model. This model has been applied to the CORD-19 collection of full-text coronavirus-related research papers. This dataset comprises the output of this model and each scientific article's relevant metadata.  Data are derived from the CORD-19 dataset provided by AllenAI, release version 2021-02-08 (changelog cord-19_2021-02-08.tar.gz 7.4GB c5446fea 29f69de2) downloaded from AWS on 08-Feb-2021.
创建时间:
2021-03-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作