BigGIM-DrugResponse KG version 3.0

Name: BigGIM-DrugResponse KG version 3.0
Creator: Qin, Guangrong
Published: 2024-12-23 00:00:00
License: 暂无描述

Figshare2024-12-23 更新2026-04-08 收录

下载链接：

https://figshare.com/articles/dataset/BigGIM-DrugResponse_KG/28079609/2

下载链接

链接失效反馈

官方服务：

资源简介：

Description - BigGIM-DrugResponse KG, created and maintained by the Multiomics Provider, includes multiple knowledge graphs aggregated from public knowledge resource or derived empirical associations from omics, drug screening and functional screening datasets. It includes the concepts of diseases, drugs or chemicals, genes and proteins etc.BigGIM-DrugResponse KG includes both empirical findings from large datasets, as well as from aggregated public knowledge resources or literature. It expands the previous BigGIM from only gene_gene_interactions to more biological concepts, such as disease, drugs, and tissue types, etc. The categories of nodes include: Genes, Drugs (SmallMolecules), Diseases; and the categories of edges (predicates) include Gene ~ gene_associated_with_condition ~ Disease, Gene (aspect qualifier: Genetic variants) ~ associated_with_sensitivity_to ~ SmallMolecule (aspect qualifier: IC50) etc.BigGIM II KGs includes the following components:BigGIM II - Drug response (mutation-based): To understand how different gene mutations are associated with different drug responses, we extracted BigGIM II- Drug response KG (mutation based). Whole Exon Sequencing data (gene mutation data) and drug screening data from GDSC study (Iorio et al., 2016, Cell 166, 740–754. PMID:27397505) were used for knowledge graph constraction. Resource data were downloaded from https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources/Home.html. Based on the mutation status of each gene, cell lines in each tumor type were grouped into either the wild-type group or the mutated group. The significance of the difference in the drug response IC50 values between the two groups was tested using Student T-test. The effect size was measured using the following equation: n1 = len(x) # x represents a vector of IC50 values for the mutated samples n2 = len(y) # y represents a vector of IC50 values for the wt samples s = np.sqrt(((n1 - 1)(np.std(x))(np.std(x)) + (n2 - 1) * (np.std(y)) * (np.std(y))) / (n1 + n2 -2)) d = (np.mean(x) - np.mean(y)) / s return(d) The associations with P-value smaller than 0.05 were digested into the knowledge graph.BigGIM II - Drug response (gene expression based): To understand how different gene expressions are associated with different drug responses, we developed BigGIM II- Drug response KG (expression-based). Spearman correlations were calculated between the gene expression (RMA gene expression values) and drug response Area Under the Curve (AUC) for cell lines in different tumor types in the GDSC project. The correlations were calculated only if the number of cell lines with both the drug response data and gene expression value for more than 6 samples. For each tumor type, the correlations between gene (symbol), drug name(need to transform into drug ids), correlation, p-value, sample size, and tumor types are included. Resource data were downloaded from https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources/Home.html. please cite the original paper if the result is used (Iorio et al., 2016, Cell 166, 740–754. PMID:27397505).BigGIM II - Gene Gene interaction (expression-based) KG: BigGIM II - Gene Gene interactions (expression-based) is an updated version for BigGIM I with updated datasets from tumor based co-expression or tissue-based gene expression. Update from the tumor-based co-expression: With the updated datasets from TCGA pancan study, we used the new version of gene expression value from the ISB-CGC PanCancer Atlas BigQuery Tables (pancancer-atlas.Filtered.EBpp_AdjustPANCAN_IlluminaHiSeq_RNASeqV2_genExp_filtered) to generate the graph (BigGIM II - Gene Gene interaction (expr-expr)). Gene co-expression correlations were computed using Pearson correlation. Gene expressions with observations in at least 25 samples were taken into consideration. Coefficient and p-value were derived from Pearson correlation analysis.BigGIM II - gene_associated_with_condition_Disease (Disease-Gene) It describes which genes are highly frequently mutated in different tumor types using the gene mutation data from TCGA-pancancer. We used TCGA data to quantify the gene mutation frequency at the patient level. Genes with mutation frequency greater than 5% and has mutated samples greater than 5 samples were selected. To further narrow down the genes, we further filtered the gene list according to the identification of cancer driver genes as published in PMID:29625053. Only driver genes were exposed to the MultiomicsBigGIM_DrugResponse_KP as of the version updated in Sep 2022.

数据集描述 - BigGIM-DrugResponse 知识图谱（BigGIM-DrugResponse KG）由多组学提供商创建并维护，整合了多类源自公开知识库的知识图谱，或是从组学、药物筛选与功能筛选数据集衍生的实证关联。该图谱涵盖疾病、药物/化学品、基因与蛋白质等概念。 BigGIM-DrugResponse KG既包含大型数据集的实证发现，也整合了公开知识库或学术文献中的内容。它将此前仅覆盖基因-基因互作的BigGIM拓展至更多生物概念，例如疾病、药物与组织类型等。节点类别包括：基因、药物（小分子化合物）、疾病；边（谓词）类别包括：基因~与疾病相关联~疾病，基因（属性限定：遗传变异）~与小分子化合物敏感性相关~小分子化合物（属性限定：半数最大抑制浓度（half maximal inhibitory concentration, IC50））等。 BigGIM II 知识图谱包含以下组成部分： 1. BigGIM II - 基于突变的药物响应知识图谱：为解析不同基因突变与药物响应的关联，我们构建了BigGIM II-基于突变的药物响应知识图谱。采用GDSC（癌症药物敏感性基因组学）研究（Iorio等人，2016，《Cell》166, 740–754. PMID:27397505）中的全外显子测序数据（基因突变数据）与药物筛选数据构建知识图谱。原始数据下载自https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources/Home.html。根据每个基因的突变状态，将每种肿瘤类型的细胞系分为野生型组与突变型组。采用Student t检验分析两组间药物响应IC50值的差异显著性。效应量通过以下公式计算： n1 = len(x) # x代表突变样本的IC50值向量 n2 = len(y) # y代表野生型样本的IC50值向量 s = np.sqrt(((n1 - 1)*np.std(x)**2 + (n2 - 1) * np.std(y)**2) / (n1 + n2 -2)) d = (np.mean(x) - np.mean(y)) / s return(d) 将P值小于0.05的关联纳入知识图谱。 2. BigGIM II - 基于基因表达的药物响应知识图谱：为解析不同基因表达水平与药物响应的关联，我们构建了BigGIM II-基于表达的药物响应知识图谱。针对GDSC项目中不同肿瘤类型的细胞系，计算基因表达（稳健多阵列平均（Robust Multi-array Average, RMA）基因表达值）与药物响应曲线下面积（Area Under the Curve, AUC）之间的斯皮尔曼相关性。仅当同时拥有药物响应数据与基因表达值的细胞系样本量超过6时，才计算相关性。每种肿瘤类型的信息均包含：基因符号、药物名称（需转换为药物标识符）、相关性系数、P值、样本量及肿瘤类型。原始数据下载自https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources/Home.html。若使用该结果，请引用原始文献（Iorio等人，2016，《Cell》166, 740–754. PMID:27397505）。 3. BigGIM II - 基于表达的基因-基因互作知识图谱：BigGIM II-基于表达的基因-基因互作知识图谱是BigGIM I的更新版本，采用了基于肿瘤的共表达或组织基因表达的更新数据集。基于肿瘤共表达的更新：借助TCGA泛癌研究的更新数据集，我们使用ISB-CGC泛癌图谱BigQuery表（pancancer-atlas.Filtered.EBpp_AdjustPANCAN_IlluminaHiSeq_RNASeqV2_genExp_filtered）中的新版基因表达值构建该图谱（BigGIM II - 基因-基因互作（表达-表达））。采用皮尔逊相关系数计算基因共表达相关性。仅纳入至少在25个样本中存在表达数据的基因。通过皮尔逊相关分析得到相关系数与P值。 4. BigGIM II - 基因-与疾病相关（疾病-基因）：该图谱描述了不同肿瘤类型中高频突变的基因，采用TCGA泛癌研究的基因突变数据。我们在患者层面量化基因的突变频率。筛选出突变频率大于5%且突变样本数大于5的基因。为进一步缩小基因范围，我们还根据PMID:29625053发表的癌症驱动基因鉴定结果对基因列表进行过滤。截至2022年9月更新的版本中，仅癌症驱动基因被纳入MultiomicsBigGIM_DrugResponse_KP。

提供机构：

Qin, Guangrong

创建时间：

2024-12-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集