five

Alkaloids-dataset

收藏
DataCite Commons2025-04-07 更新2025-04-16 收录
下载链接:
https://data.mendeley.com/datasets/834p2nf8g2/1
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains 5,813 open-access scientific articles related to alkaloids. Articles were sourced from PubMed Central and ChemRxiv, focusing on literature suitable for NLP and QA system development in the chemistry domain. All PMC articles were filtered using "free full text" and downloaded only if a PMCID and open PDF were available. ChemRxiv articles were accessed through the official API and include only those with downloadable open-access PDFs. Original PDF files are not included in this dataset due to file size and licensing constraints. Instead, each article is provided in plain-text format. Texts were extracted using the PyMuPDF library and lightly cleaned for consistency. Metadata is included for each article in JSON format, listing title, authors, publication year, source, DOI, and file paths. The dataset is structured as follows: 1) metadata.json: metadata for all articles 2) extracted_texts/ :plain-text versions extracted from PDFs 3) README.md The dataset supports research in question answering, information retrieval, summarization, and chemical entity recognition. This dataset is for non-commercial academic research only.
提供机构:
Mendeley Data
创建时间:
2025-04-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作