yashm/pubmed-bioinformatics-abstracts_all_years
收藏Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/yashm/pubmed-bioinformatics-abstracts_all_years
下载链接
链接失效反馈官方服务:
资源简介:
PubMed生物信息学摘要(所有年份)是一个精选的数据集,包含从PubMed通过NCBI Entrez API获取的生物信息学研究摘要,涵盖1990年至2026年的出版物。该数据集专为大型语言模型(LLM)微调、生物医学自然语言处理(NLP)和科学文本分析而设计。数据集包含以下字段:PubMed ID(pmid)、文章标题(title)、完整摘要文本(abstract)、出版年份(year)、期刊名称(journal)、DOI(doi)、来源(source,始终为pubmed)和收集时使用的时间切片(slice)。数据集仅包含非空摘要的文章。数据收集方法是通过NCBI PubMed的Entrez E-utilities (Biopython)进行查询,查询条件为bioinformatics[MeSH Terms] AND hasabstract[text] AND journal article[pt],时间范围为2017-2019年。数据集遵循CC BY-NC-SA 4.0许可证,并要求用户遵守NCBI/NLM的使用政策。
PubMed Bioinformatics Abstracts — All Years is a curated dataset of bioinformatics research abstracts fetched from PubMed via the NCBI Entrez API, covering publications from 1990 through 2026. Designed for LLM fine-tuning, biomedical NLP, and scientific text analysis. The dataset includes the following fields: PubMed ID (pmid), article title (title), full abstract text (abstract), publication year (year), journal name (journal), DOI (doi), source (always pubmed), and time slice used during collection (slice). Only articles with non-empty abstracts are included. The collection method involves querying NCBI PubMed via Entrez E-utilities (Biopython) with the query bioinformatics[MeSH Terms] AND hasabstract[text] AND journal article[pt], covering the date range 2017–2019. The dataset is released under the CC BY-NC-SA 4.0 license and requires compliance with NCBI/NLM usage policies.
提供机构:
yashm



