Multiscale Interactome Data -- Revised
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/7819797
下载链接
链接失效反馈官方服务:
资源简介:
Original GitHub Repository: https://github.com/snap-stanford/multiscale-interactome
Forked GitHub Repository: https://github.com/callahantiff/multiscale-interactome/tree/development
This repository stores a revised version of the original data that was used in the publication titled: Identification of disease treatment mechanisms through the multiscale interactome. As described in the original GitHub's Readme, the original data can be downloaded directly from: http://snap.stanford.edu/multiscale-interactome/data/data.tar.gz.
Description of Original Data
Drug-Protein (n=8,568):
Source(s):
DrugBank (v5.1.1; 2018; drugbank_approved_target_uniprot_links.csv)
Drug Repurposing Hub (September 2018)
Processing: map Uniprot to Entrez gene using HUGO (October 2018) and drug ids to DrugBank ids
Filtering: Filter proteins to only keep those that appear in the Protein-Protein edge set.
Disease-Protein (n=25,212):
Source(s): DisGeNet (March 2018)
Filtering: only keep only expert curated gene-disease associations. (1) exclude disease-gene relationships that are inferred, based on orthology, animal models, or literature mining; (2) remove therapeutic disease-gene associations; and (3) remove disease-gene relationships that do not appear in the Protein-Protein edge set.
Protein-Protein (n=387,626):
Source(s):
BioGRID (v3.5.178; November 2019; BIOGRID-ORGANISM-Homo_sapiens-3.5.178.tab)
Database of Interacting Proteins (February 2017; Hsapi20170205.txt). Include all experimental methods
Human Reference Protein Interactome Mapping Project. Four networks derived from high-throughput yeast two hybrid assays.
Menche 2015 (PMID:25700523). Compiles different types of physical protein-protein interactions.
Processing: Map protein ids to Entrez gene ids using HUGO (sources 1-2 only)
Filtering: only human proteins with physical interactions and direct experimental evidence (no genetic/indirect)
Protein-Biological Process (n=34,777): Source(s): Gene Ontology (human; February 2018)
Processing: use master ids provided by GOATOOLS (v0.8.4)
Filtering: only allow: EXP, IDA, IMP, IGI, HTP, HDA, HMP, HGI. Exclude any protein-biological functions inferred from: physical interactions, gene expression patterns, phylogenetically inferred annotations or computational analyses, automatic annotations (i.e., based on author statements, curator inference, electronic annotation), and those with no biological data
Biological Process-Biological Process (n=22,545):
Source(s): Gene Ontology (human; February 2018) + Gene Ontology Plus (human version; July 2020)
Filtering: Allow following relationship types: regulates, positively regulates, negatively regulates, part of, is a. Only consider BPs associated with at least one drug target or disease protein (directly or through children)
⚠️ Updates to Original Implementation ⚠️
Modifications to Original Data and Code
Ensured every entry had a valid identifier and label
Reconciled duplicate gene entries (i.e., gene identifiers that had been merged)
Changed genes are listed in: resources/data/updated_gene_identifiers.xlsx
Original GitHub Repository: https://github.com/snap-stanford/multiscale-interactome
Forked GitHub Repository: https://github.com/callahantiff/multiscale-interactome/tree/development
本仓库存储了发表于论文《通过多尺度交互组(multiscale interactome)识别疾病治疗机制》中所使用的原始数据的修订版本。正如原GitHub仓库的README中所述,原始数据可直接从以下地址下载:http://snap.stanford.edu/multiscale-interactome/data/data.tar.gz。
### 原始数据说明
#### 药物-蛋白质(Drug-Protein,n=8,568):
**来源:**
- DrugBank(v5.1.1;2018年;drugbank_approved_target_uniprot_links.csv)
- 药物再利用中心(Drug Repurposing Hub,2018年9月)
**处理步骤:** 使用HUGO(2018年10月)将Uniprot标识符映射至Entrez基因标识符,并将药物标识符转换为DrugBank标识符。
**筛选规则:** 仅保留出现在蛋白质-蛋白质互作边集中的蛋白质。
#### 疾病-蛋白质(Disease-Protein,n=25,212):
**来源:** DisGeNet(2018年3月)
**筛选规则:** 仅保留经专家人工审核的基因-疾病关联。具体包括:
1. 排除基于同源性、动物模型或文献挖掘推导得到的疾病-基因关联;
2. 移除与治疗相关的疾病-基因关联;
3. 剔除未出现在蛋白质-蛋白质互作边集中的疾病-基因关联。
#### 蛋白质-蛋白质互作(Protein-Protein,n=387,626):
**来源:**
- BioGRID(v3.5.178;2019年11月;BIOGRID-ORGANISM-Homo_sapiens-3.5.178.tab)
- 互作蛋白质数据库(Database of Interacting Proteins,2017年2月;Hsapi20170205.txt),纳入所有实验方法得到的数据;
- 人类参考蛋白质交互组图谱计划(Human Reference Protein Interactome Mapping Project),包含4个基于高通量酵母双杂交实验得到的互作网络;
- Menche等人2015年研究(PMID:25700523),整合了多种类型的物理蛋白质相互作用。
**处理步骤:** 仅针对前两类来源,使用HUGO将蛋白质标识符映射至Entrez基因标识符。
**筛选规则:** 仅保留具有物理相互作用且带有直接实验证据的人类蛋白质(排除遗传或间接关联数据)。
#### 蛋白质-生物过程(Protein-Biological Process,n=34,777):
**来源:** 基因本体论(Gene Ontology,人类数据集;2018年2月)
**处理步骤:** 使用GOATOOLS(v0.8.4)提供的主标识符。
**筛选规则:** 仅保留以下证据类型的注释:EXP、IDA、IMP、IGI、HTP、HDA、HMP、HGI。排除通过以下方式推导得到的蛋白质-生物功能注释:物理相互作用、基因表达模式、系统发育推导注释或计算分析、自动注释(即基于作者声明、审核员推断、电子注释)以及无生物学数据支撑的注释。
#### 生物过程-生物过程(Biological Process-Biological Process,n=22,545):
**来源:** 基因本体论(Gene Ontology,人类数据集;2018年2月)+ 基因本体论增强版(Gene Ontology Plus,人类版本;2020年7月)
**筛选规则:** 仅保留以下关系类型:调控(regulates)、正调控(positively regulates)、负调控(negatively regulates)、属于(part of)、是(is a)。仅考虑与至少一个药物靶点或疾病蛋白质直接或通过子代术语相关联的生物过程。
⚠️ 原始实现更新 ⚠️
### 原始数据与代码的修改内容
- 确保每个条目均带有合法的标识符与标签;
- 整合重复的基因条目(即已合并的基因标识符);
- 已更新的基因列表存储于:resources/data/updated_gene_identifiers.xlsx
创建时间:
2023-04-12



