five

biorXiv-pdf

收藏
魔搭社区2025-12-05 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/laion/biorXiv-pdf
下载链接
链接失效反馈
官方服务:
资源简介:
<div align="center"> <img src="biorXiv.jpg" alt="ChemrXiv Pdf" width="250"/> <p><b>BiorXiv Pdf</b></p> </div> **BiorXiv PDF dataset** is a collection of PDF documents gathered from the BiorXiv website. This initiative aims to democratize artificial intelligence research by providing researchers with access to readily available training datasets. It is part of our broader effort to publish open access research papers as collective datasets. BiorXiv is a renowned preprint publication in the field of biology and related disciplines. It is operated by Cold Spring Harbor Laboratory (CSHL) and the Chan Zuckerberg Initiative, founded by Mark Zuckerberg. We anticipate that researchers and enthusiasts will utilize this dataset for training and developing groundbreaking scientific domain-specific models, as well as fine-tuning existing models for specialized genres and applications. We kindly request that all users employ the dataset responsibly and adhere to the licensing terms imposed by the paper owners. ### Dataset information **Index date:** 15th of September 2024 **Total no. of pdfs:** 340,708 **Downloaded no. of pdfs:** 245,586 **Dataset in (TB):** 1.1TB ### Why are downloaded pdfs less than total no. pdfs? biorXiv API couldn’t fetch them and it was an error from their side. We are going to fix it but it’ll take some time. We promise to fix them. ### What’s the licence? Most pdfs are usable under CC and other forms of permissible licence. Unfortunately, some may be restrictive in nature or under “NON REUSABLE”. Our metadata does not provide licence information yet but we are going to update in the next 2 weeks.

<div align="center"> <img src="biorXiv.jpg" alt="ChemrXiv Pdf" width="250"/> <p><b>BiorXiv 文档集</b></p> </div> **BiorXiv PDF 数据集**是从BiorXiv平台收集的PDF文档合集。本项目旨在推动人工智能研究的民主化,为研究人员提供易于获取的训练数据集,亦是我们将开放获取研究论文以集体数据集形式发布的整体工作的一部分。 BiorXiv是生物学及相关学科领域极具影响力的预印本出版平台,由冷泉港实验室(Cold Spring Harbor Laboratory, CSHL)与马克·扎克伯格创立的陈·扎克伯格倡议(Chan Zuckerberg Initiative)联合运营。 我们期望研究人员与爱好者能够利用本数据集,训练并开发具有突破性的科学领域专用模型,同时针对特定细分领域与应用场景对现有模型进行微调优化。 我们诚挚恳请所有使用者合规、负责任地使用本数据集,并严格遵守论文所有者所规定的许可条款。 ### 数据集详情 **索引日期**:2024年9月15日 **PDF总数量**:340,708 **已下载PDF数量**:245,586 **数据集总容量**:1.1TB ### 为何已下载PDF数量少于总数量? BiorXiv的应用程序编程接口(Application Programming Interface, API)未能成功获取部分文档,该问题源于其平台自身的错误。我们将着手修复该问题,但需要一定时间,我们承诺将完成全部修复工作。 ### 许可协议说明 绝大多数PDF文档可在知识共享(Creative Commons, CC)及其他合规许可协议框架下使用。遗憾的是,部分文档可能存在使用限制,或标注为“不可复用”。目前我们的元数据尚未包含许可相关信息,但将在未来两周内完成更新。
提供机构:
maas
创建时间:
2025-10-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作