five

PubMed Research Database: Biomedical Citations

收藏
Snowflake2025-07-01 更新2025-07-02 收录
下载链接:
https://app.snowflake.com/marketplace/listing/GZ2FWZ5FDBS
下载链接
链接失效反馈
官方服务:
资源简介:
Accelerate innovation and decision-making with instant access to the world’s most trusted biomedical research. This dataset offers the complete, continuously updated PubMed library—ready to power strategic insights across R&D, medical affairs, data science, and academic research. Instead of waiting months for data collection and preparation, your teams can immediately explore high-quality, structured biomedical literature. Uncover emerging scientific trends, identify thought leaders, analyze research funding patterns, and connect publications with therapeutic areas or chemical compounds—all within a single, easy-to-use data product. This comprehensive resource includes core publication metadata, author affiliations, journal information, standardized medical topics (MeSH), chemical substances, research grants, citations, and more—empowering your organization to make smarter, faster, evidence-based decisions. <p><br/></p> **Key Features & Data Tables** This dataset is normalized into a relational schema, allowing you to easily join tables and perform complex queries. It includes: - **ARTICLES:** Core metadata for each publication, including titles, abstracts, publication dates, and unique identifiers (DOI, PMC ID). - **AUTHORS:** Detailed information on every author, including their affiliations, ORCID identifiers, and the order of their contribution. - **JOURNALS:** A dedicated table for journal metadata, including titles, ISSN, and country of publication. - **MeSH HEADINGS:** Medical Subject Headings (MeSH) assigned to each article, providing a standardized vocabulary for powerful topical queries and trend analysis. - **CHEMICALS:** A catalog of chemical compounds and substances discussed in the literature. - **GRANTS:** Detailed information on funding, including grant IDs, agencies, and country of origin, allowing you to track research funding and impact. - **KEYWORDS:** Author-submitted keywords for deeper content analysis. - **REFERENCES & CITATIONS:** A complete list of citations for each article, enabling network analysis and impact assessment. - **PUBLICATION TYPES:** Categorization of each record (e.g., Journal Article, Review, Case Report). - **EXTERNAL DATA LINKS:** Connections to supplementary data in other repositories like Figshare. **Use Cases for Enterprise & HCLS** - **Pharmaceutical & Biotech R&D:** - Identify emerging research trends and novel therapeutic targets. - Conduct competitive intelligence by tracking publications from other organizations. - Find Key Opinion Leaders (KOLs) and potential research collaborators. - **Medical Affairs & HEOR:** - Perform large-scale literature reviews for Health Economics and Outcomes Research (HEOR). - Track the publication of clinical trial results and real-world evidence. - **Data Science & AI/ML:** - Build knowledge graphs to map relationships between genes, diseases, and chemical compounds. - Train Natural Language Processing (NLP) models on a massive corpus of structured and unstructured text. - **Academic & Research Institutions:** - Analyze publication trends and institutional research output. - Discover funding opportunities and track the impact of grants.
提供机构:
The Data Developers LLC
创建时间:
2025-07-01
原始信息汇总

PubMed Research Database: Biomedical Citations

概述

  • 提供方: The Data Developers LLC
  • 访问权限: 免费无限访问
  • 数据更新频率: 每月
  • 地理覆盖范围: 全球
  • 云区域可用性: AWS US East (Ohio)

数据集描述

  • 提供完整的、持续更新的PubMed文献库,支持医疗健康与生命科学(HCLS)分析和AI模型。
  • 包含核心出版物元数据、作者隶属关系、期刊信息、标准化医学主题(MeSH)、化学物质、研究资助、引用等。

关键特性与数据表

  • ARTICLES: 出版物核心元数据,包括标题、摘要、出版日期和唯一标识符(DOI, PMC ID)。
  • AUTHORS: 作者详细信息,包括隶属关系、ORCID标识符和贡献顺序。
  • JOURNALS: 期刊元数据,包括标题、ISSN和出版国家。
  • MeSH HEADINGS: 分配给每篇文章的医学主题标题,用于强大的主题查询和趋势分析。
  • CHEMICALS: 文献中讨论的化学化合物和物质目录。
  • GRANTS: 资助详细信息,包括资助ID、机构和来源国家。
  • KEYWORDS: 作者提交的关键词,用于深度内容分析。
  • REFERENCES & CITATIONS: 每篇文章的完整引用列表,支持网络分析和影响评估。
  • PUBLICATION TYPES: 记录分类(如期刊文章、综述、病例报告)。
  • EXTERNAL DATA LINKS: 与其他存储库(如Figshare)的补充数据链接。

业务需求

  • Patient 360: 通过最新临床证据丰富患者档案。
  • Life Sciences Commercialization: 加速市场策略,识别关键医学专家。
  • Quantitative Analysis: 支持复杂定量模型,预测研究趋势。
  • Real World Data (RWD): 验证真实世界数据洞察。
  • Drug Repurposing and Discovery: 发现药物再利用机会。

数据字典(部分)

ARTICLES表

  • ABSTRACT: 文章内容的完整摘要。
  • ARTICLE_TITLE: 文章或出版物的完整标题。
  • COI_STATEMENT: 期刊的完整标题。
  • DOI: 文章的出版年份。
  • JOURNAL_TITLE: 文章的出版月份或季节。
  • LANGUAGE: 文章的出版日期。
  • OWNER: 文章内容的主要语言。
  • PAGINATION: 文章的Digital Object Identifier。
  • PMC_ID: PubMed Central中全文文章的唯一标识符。
  • PUBLICATION_DAY: 文章出版日。

使用示例

识别精神分裂症领域的关键意见领袖(KOLs)

sql SELECT a.last_name, a.fore_name, a.affiliation, COUNT(DISTINCT a.pubmed_id) AS publication_count FROM authors a JOIN mesh_headings mh ON a.pubmed_id = mh.pubmed_id WHERE mh.descriptor_name = Schizophrenia GROUP BY a.last_name, a.fore_name, a.affiliation ORDER BY publication_count DESC LIMIT 20;

识别顶级资助研究领域

sql SELECT mh.descriptor_name, COUNT(DISTINCT g.pubmed_id) AS funded_publication_count FROM grants g JOIN mesh_headings mh ON g.pubmed_id = mh.pubmed_id WHERE g.agency ILIKE %National Institutes of Health% OR g.agency ILIKE %NIH% GROUP BY mh.descriptor_name ORDER BY funded_publication_count DESC LIMIT 20;

发现高影响力期刊中的新兴关键词

sql SELECT k.keyword_text, COUNT(*) AS keyword_frequency FROM keywords k JOIN articles a ON k.pubmed_id = a.pubmed_id WHERE (a.journal_title ILIKE %Nature% OR a.journal_title ILIKE %Science% OR a.journal_title ILIKE %Lancet% OR a.journal_title ILIKE %New England Journal of Medicine%) AND NOT EXISTS ( SELECT 1 FROM mesh_headings mh WHERE mh.pubmed_id = k.pubmed_id AND mh.descriptor_name = k.keyword_text AND mh.major_topic_yn = Y ) GROUP BY k.keyword_text;

其他相关数据集

  • Global Protein Knowledgebase: 生命科学的全面序列和功能注释。
  • ChEMBL Bioactivity & Drug Discovery: 精选药物相互作用和目标数据。
  • Biomedical Articles Corpus: 生物医学研究的综合数据产品。
  • FDA DailyMed Drug Labels: 美国药物标签数据。

联系方式

  • 销售: marketplace.snowflake@thedatadevelopers.com
  • 支持: support.marketplace@thedatadevelopers.com
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集提供完整的、持续更新的PubMed生物医学文献库,包含文章、作者、期刊、MeSH主题词、化学品、资助等结构化数据表。它支持研发、医学事务、数据科学和学术研究等领域,用于趋势分析、竞争情报、知识图谱构建和循证决策。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作