bioRxiv 10k with assets
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5592546
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is a superset of the bioRxiv 10k dataset. It additionally includes the assets (usually images) that are linked by the XML files.
The assets were retrieved from the bioRxiv's tdm bucket. Assets that are not linked by the XML were omitted. Examples of those would be some large videos.
This dataset serves a similar purpose as the bioRxiv 10k dataset, but for use-cases that require the assets. e.g. training and evaluation of figure image extraction.
This dataset mirrors the exact same documents and structure as the bioRxiv 10k dataset. But rather than just containing the PDF and XML files, it also contains the linked assets (often images, but not necessarily).
The dataset was created as part of eLife's ScienceBeam project.
创建时间:
2021-10-29



