five

CZ软件提及数据集

收藏
arXiv2022-09-28 更新2024-06-21 收录
下载链接:
https://doi.org/10.5061/dryad.6wwpzgn2c
下载链接
链接失效反馈
官方服务:
资源简介:
CZ软件提及数据集是由陈·扎克伯格倡议创建的一个新数据集,专注于生物医学论文中的软件提及。该数据集通过训练有素的SciBERT模型从NIH PubMed Central集合和多个出版商提供的论文中提取了112万个独特的软件提及。数据集不仅提供了软件提及的来源和上下文,还包括了大量的元数据,以及对许多提及的软件实体进行消歧和链接。该数据集的创建旨在帮助评估软件(特别是科学开源项目)对科学的影响,并已公开发布,供社区使用。

The CZ Software Mentions Dataset is a newly created dataset developed by the Chan Zuckerberg Initiative (CZI), focusing on software mentions in biomedical literature. This dataset extracts 1.12 million unique software mentions from the NIH PubMed Central collection and papers provided by multiple publishers using a fine-tuned SciBERT model. In addition to providing the source and context of each software mention, the dataset also includes extensive metadata, as well as entity disambiguation and linking for numerous mentioned software entities. Developed to help assess the impact of software—especially scientific open-source projects—on scientific research, this dataset has been publicly released for community use.
提供机构:
陈·扎克伯格倡议
创建时间:
2022-09-02
二维码
社区交流群
二维码
科研交流群
商业服务