Universal Protein Knowledgebase
收藏Snowflake2025-07-10 更新2025-07-11 收录
下载链接:
https://app.snowflake.com/marketplace/listing/GZ2FWZ5FDBW
下载链接
链接失效反馈官方服务:
资源简介:
The Global Protein Knowledgebase from The Data Developers is a definitive, AI-ready protein intelligence source delivered directly into your Snowflake environment. It is designed to eliminate hundreds of hours of data preparation by providing a pre-built, expert-validated solution that transforms complex UniProt data into a query-ready relational format.
We perform the ongoing integration, cleaning, and harmonization of UniProt, making its rich data instantly accessible. This provides a unified view across multiple, logically linked tables, allowing your team to immediately explore:
- **Core Protein Identity:** Access primary and secondary accession numbers , recommended protein names , and EC numbers in theproteins_data table.
- **Deep Functional Annotations:** The comments_data table provides extensive details on protein function, including disease associations (disease_id, disease_description) , subcellular locations , biochemical reactions (reaction) , kinetic parameters , and tissue specificity.
- **Genetic Information:** Connect proteins to their genetic origins with gene_name andsynonyms from thegenes_data table.
- **Literature Evidence:** Seamlessly link proteins to their scientific evidence with direct access to publication titles , authors , and PubMed / PubMed Central IDs (pmid) in thereferences_data table.
- **Sequence & Features:** Analyze protein sequences, stored in sequence_data , and investigate specific domains, sites, and regions with coordinates from thefeatures_data table.
- **Integrated Database Cross-References:** Leverage pre-joined links to dozens of other critical life science databases via the database_references_data table.
**Key Value & Features:**
- **Accelerated AI & Analytics:** Utilize structured data containing full protein sequences (sequence_data) and detailed annotations (features_data, comments_data) to build superior ML models and run complex bioinformatics pipelines.
- **Rich, Queryable Context:** Instantly join tables to connect proteins to diseases, genetic origins, cellular functions, and supporting literature from PubMed Central—all within a single, powerful SQL query.
- **Zero ETL & Maintenance:** Through Snowflake Secure Data Sharing, you get instant, zero-copy access to a live dataset that is always current. This removes the need to build, manage, or maintain data pipelines.
- **Intuitive Relational Schema:** Data is organized into clear, relational tables (e.g., proteins_data, comments_data, genes_data, references_data) making it easy for data scientists and analysts to drive discovery.
**Top Use Case: Accelerated Drug Target Validation**
Rapidly identify and validate novel drug targets. By joining protein data (proteins_data) with disease associations (comments_data) and literature evidence (references_data), researchers can quickly pinpoint proteins implicated in specific diseases and assess the strength of the supporting scientific evidence, dramatically shortening the early phases of drug discovery.
提供机构:
The Data Developers LLC
创建时间:
2025-07-10
原始信息汇总
Universal Protein Knowledgebase 数据集概述
数据集基本信息
- 提供方: The Data Developers LLC
- 试用信息: 7天免费试用
- 更新频率: 每月
- 时间覆盖范围: 最近6个月
- 云区域可用性: AWS US West (Oregon)
数据集描述
- 核心内容: 全球蛋白质知识库,提供AI-ready的蛋白质智能数据
- 数据来源: UniProt数据集成、清理和协调
- 主要特点:
- 预构建、专家验证的解决方案
- 查询就绪的关系型格式
- 提供统一的跨多表逻辑链接视图
数据表结构
proteins_data: 核心蛋白质身份信息comments_data: 深度功能注释genes_data: 遗传信息references_data: 文献证据sequence_data: 蛋白质序列features_data: 序列特征database_references_data: 集成数据库交叉引用
关键价值与特点
- 加速AI与分析: 结构化数据包含完整蛋白质序列和详细注释
- 丰富的可查询上下文: 可即时连接蛋白质与疾病、遗传起源、细胞功能等
- 零ETL和维护: 通过Snowflake安全数据共享提供实时访问
- 直观的关系模式: 数据组织清晰,便于数据科学家和分析师使用
主要应用场景
- 加速药物靶点验证
- AI驱动的药物再利用
- 生命科学商业化
- 生物标志物驱动的临床试验设计
使用示例
-
优先考虑未充分表征的疾病靶点
- 识别与疾病相关的人类蛋白质
- 查询示例包含蛋白质条目名称、推荐蛋白质名称、蛋白质存在证据等字段
-
比较蛋白质组学用于宿主-病原体或进化研究
- 提取和比较基于生物体和详细分类谱系的蛋白质信息
- 查询示例包含蛋白质条目名称、推荐蛋白质名称、生物体名称等字段
联系方式
- 销售: marketplace.snowflake@thedatadevelopers.com
- 支持: support.marketplace@thedatadevelopers.com
搜集汇总
数据集介绍

背景与挑战
背景概述
Universal Protein Knowledgebase是一个基于UniProt的AI就绪蛋白质知识库,以关系型表结构在Snowflake环境中提供预清理和整合的数据,涵盖蛋白质身份、功能注释、遗传信息及文献证据等。它支持零ETL访问和丰富查询,旨在加速药物靶点验证等生物信息学分析,帮助研究人员快速获取蛋白质相关疾病和科学证据。
以上内容由遇见数据集搜集并总结生成



