five

Universal Protein Knowledgebase

收藏
Snowflake2025-07-10 更新2025-07-11 收录
下载链接:
https://app.snowflake.com/marketplace/listing/GZ2FWZ5FDBW
下载链接
链接失效反馈
官方服务:
资源简介:
The Global Protein Knowledgebase from The Data Developers is a definitive, AI-ready protein intelligence source delivered directly into your Snowflake environment. It is designed to eliminate hundreds of hours of data preparation by providing a pre-built, expert-validated solution that transforms complex UniProt data into a query-ready relational format. We perform the ongoing integration, cleaning, and harmonization of UniProt, making its rich data instantly accessible. This provides a unified view across multiple, logically linked tables, allowing your team to immediately explore: - **Core Protein Identity:** Access primary and secondary accession numbers , recommended protein names , and EC numbers in theproteins_data table. - **Deep Functional Annotations:** The comments_data table provides extensive details on protein function, including disease associations (disease_id, disease_description) , subcellular locations , biochemical reactions (reaction) , kinetic parameters , and tissue specificity. - **Genetic Information:** Connect proteins to their genetic origins with gene_name andsynonyms from thegenes_data table. - **Literature Evidence:** Seamlessly link proteins to their scientific evidence with direct access to publication titles , authors , and PubMed / PubMed Central IDs (pmid) in thereferences_data table. - **Sequence & Features:** Analyze protein sequences, stored in sequence_data , and investigate specific domains, sites, and regions with coordinates from thefeatures_data table. - **Integrated Database Cross-References:** Leverage pre-joined links to dozens of other critical life science databases via the database_references_data table. **Key Value & Features:** - **Accelerated AI & Analytics:** Utilize structured data containing full protein sequences (sequence_data) and detailed annotations (features_data, comments_data) to build superior ML models and run complex bioinformatics pipelines. - **Rich, Queryable Context:** Instantly join tables to connect proteins to diseases, genetic origins, cellular functions, and supporting literature from PubMed Central—all within a single, powerful SQL query. - **Zero ETL & Maintenance:** Through Snowflake Secure Data Sharing, you get instant, zero-copy access to a live dataset that is always current. This removes the need to build, manage, or maintain data pipelines. - **Intuitive Relational Schema:** Data is organized into clear, relational tables (e.g., proteins_data, comments_data, genes_data, references_data) making it easy for data scientists and analysts to drive discovery. **Top Use Case: Accelerated Drug Target Validation** Rapidly identify and validate novel drug targets. By joining protein data (proteins_data) with disease associations (comments_data) and literature evidence (references_data), researchers can quickly pinpoint proteins implicated in specific diseases and assess the strength of the supporting scientific evidence, dramatically shortening the early phases of drug discovery.
提供机构:
The Data Developers LLC
创建时间:
2025-07-10
原始信息汇总

Universal Protein Knowledgebase 数据集概述

数据集基本信息

  • 提供方: The Data Developers LLC
  • 试用信息: 7天免费试用
  • 更新频率: 每月
  • 时间覆盖范围: 最近6个月
  • 云区域可用性: AWS US West (Oregon)

数据集描述

  • 核心内容: 全球蛋白质知识库,提供AI-ready的蛋白质智能数据
  • 数据来源: UniProt数据集成、清理和协调
  • 主要特点:
    • 预构建、专家验证的解决方案
    • 查询就绪的关系型格式
    • 提供统一的跨多表逻辑链接视图

数据表结构

  • proteins_data: 核心蛋白质身份信息
  • comments_data: 深度功能注释
  • genes_data: 遗传信息
  • references_data: 文献证据
  • sequence_data: 蛋白质序列
  • features_data: 序列特征
  • database_references_data: 集成数据库交叉引用

关键价值与特点

  • 加速AI与分析: 结构化数据包含完整蛋白质序列和详细注释
  • 丰富的可查询上下文: 可即时连接蛋白质与疾病、遗传起源、细胞功能等
  • 零ETL和维护: 通过Snowflake安全数据共享提供实时访问
  • 直观的关系模式: 数据组织清晰,便于数据科学家和分析师使用

主要应用场景

  • 加速药物靶点验证
  • AI驱动的药物再利用
  • 生命科学商业化
  • 生物标志物驱动的临床试验设计

使用示例

  1. 优先考虑未充分表征的疾病靶点

    • 识别与疾病相关的人类蛋白质
    • 查询示例包含蛋白质条目名称、推荐蛋白质名称、蛋白质存在证据等字段
  2. 比较蛋白质组学用于宿主-病原体或进化研究

    • 提取和比较基于生物体和详细分类谱系的蛋白质信息
    • 查询示例包含蛋白质条目名称、推荐蛋白质名称、生物体名称等字段

联系方式

  • 销售: marketplace.snowflake@thedatadevelopers.com
  • 支持: support.marketplace@thedatadevelopers.com
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
Universal Protein Knowledgebase是一个基于UniProt的AI就绪蛋白质知识库,以关系型表结构在Snowflake环境中提供预清理和整合的数据,涵盖蛋白质身份、功能注释、遗传信息及文献证据等。它支持零ETL访问和丰富查询,旨在加速药物靶点验证等生物信息学分析,帮助研究人员快速获取蛋白质相关疾病和科学证据。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作