five

Supporting data for "Computational reproducibility of Jupyter notebooks from biomedical publications"|生物医学数据集|计算可重复性数据集

收藏
Mendeley Data2024-02-10 更新2024-06-30 收录
生物医学
计算可重复性
下载链接:
http://gigadb.org/dataset/102478
下载链接
链接失效反馈
资源简介:
Jupyter notebooks facilitate the bundling of executable code with its documentation and output in one interactive environment, and they represent a popular mechanism to document and share computational workflows. The reproducibility of computational aspects of research is a key component of scientific reproducibility but has not yet been assessed at scale for Jupyter notebooks associated with biomedical publications. We address computational reproducibility at two levels: First, using fully automated workflows, we analyzed the computational reproducibility of Jupyter notebooks related to publications indexed in PubMed Central. We identified such notebooks by mining the articles full text, locating them on GitHub and re-running them in an environment as close to the original as possible. We documented reproduction success and exceptions and explored relationships between notebook reproducibility and variables related to the notebooks or publications. Second, this study represents a reproducibility attempt in and of itself, using essentially the same methodology twice on PubMed Central over two years. Out of 27271 notebooks from 2660 GitHub repositories associated with 3467 articles, 22578 notebooks were written in Python, including 15817 that had their dependencies declared in standard requirement files and that we attempted to re-run automatically. For 10388 of these, all declared dependencies could be installed successfully, and we re-ran them to assess reproducibility. Of these, 1203 notebooks ran through without any errors, including 879 that produced results identical to those reported in the original notebook and 324 for which our results differed from the originally reported ones. Running the other notebooks resulted in exceptions. We zoom in on common problems, highlight trends and discuss potential improvements to Jupyter-related workflows associated with biomedical publications.
创建时间:
2024-02-10
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
点击留言
数据主题
具身智能
数据集  4099个
机构  8个
大模型
数据集  439个
机构  10个
无人机
数据集  37个
机构  6个
指令微调
数据集  36个
机构  6个
蛋白质结构
数据集  50个
机构  8个
空间智能
数据集  21个
机构  5个
5,000+
优质数据集
54 个
任务类型
进入经典数据集
热门数据集

UIEB, U45, LSUI

本仓库提供了水下图像增强方法和数据集的实现,包括UIEB、U45和LSUI等数据集,用于支持水下图像增强的研究和开发。

github 收录

ECNU-SEA/SEA_data

该数据集包含四种类型的文件:原始PDF格式的论文、通过Nougat解析后的mmd文件、爬取的原始评审文本以及处理后的评审JSON文件。数据集来源于OpenReview,包括NeurIPS-2023和ICLR-2024的最新论文及其评审。

hugging_face 收录

ZINC

ZINC 是用于虚拟筛选的商用化合物的免费数据库。 ZINC 包含超过 2.3 亿种可购买的即用型 3D 格式化合物。 ZINC 还包含超过 7.5 亿种可购买的化合物,可用于搜索类似物。

OpenDataLab 收录

Breast Cancer Dataset

该项目专注于清理和转换一个乳腺癌数据集,该数据集最初由卢布尔雅那大学医学中心肿瘤研究所获得。目标是通过应用各种数据转换技术(如分类、编码和二值化)来创建一个可以由数据科学团队用于未来分析的精炼数据集。

github 收录

FAOSTAT Agricultural Data

FAOSTAT Agricultural Data 是由联合国粮食及农业组织(FAO)提供的全球农业数据集。该数据集涵盖了农业生产、贸易、价格、土地利用、水资源、气候变化、人口统计等多个方面的详细信息。数据包括了全球各个国家和地区的农业统计数据,旨在为政策制定者、研究人员和公众提供全面的农业信息。

www.fao.org 收录