five

BigGIM-DrugResponse KG

收藏
DataCite Commons2024-12-23 更新2025-04-19 收录
下载链接:
https://figshare.com/articles/dataset/BigGIM-DrugResponse_KG/28079609/1
下载链接
链接失效反馈
官方服务:
资源简介:
<b>Description</b> - BigGIM-DrugResponse KG, created and maintained by the Multiomics Provider, includes multiple knowledge graphs aggregated from public knowledge resource or derived empirical associations from omics, drug screening and functional screening datasets. It includes the concepts of diseases, drugs or chemicals, genes and proteins etc.BigGIM-DrugResponse KG includes both empirical findings from large datasets, as well as from aggregated public knowledge resources or literature. It expands the previous BigGIM from only gene_gene_interactions to more biological concepts, such as disease, drugs, and tissue types, etc.<br>The categories of nodes include: Genes, Drugs (SmallMolecules), Diseases; and the categories of edges (predicates) include Gene ~ gene_associated_with_condition ~ Disease, Gene (aspect qualifier: Genetic variants) ~ associated_with_sensitivity_to ~ SmallMolecule (aspect qualifier: IC50) etc.BigGIM II KGs includes the following components:BigGIM II - Drug response (mutation-based): To understand how different gene mutations are associated with different drug responses, we extracted BigGIM II- Drug response KG (mutation based). Whole Exon Sequencing data (gene mutation data) and drug screening data from GDSC study (Iorio et al., 2016, Cell 166, 740–754. PMID:27397505) were used for knowledge graph constraction. Resource data were downloaded from https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources/Home.html. Based on the mutation status of each gene, cell lines in each tumor type were grouped into either the wild-type group or the mutated group. The significance of the difference in the drug response IC50 values between the two groups was tested using Student T-test. The effect size was measured using the following equation:<br>n1 = len(x) # x represents a vector of IC50 values for the mutated samples<br>n2 = len(y) # y represents a vector of IC50 values for the wt samples<br>s = np.sqrt(((n1 - 1)<i>(np.std(x))</i>(np.std(x)) + (n2 - 1) * (np.std(y)) * (np.std(y))) / (n1 + n2 -2))<br>d = (np.mean(x) - np.mean(y)) / s<br>return(d)<br>The associations with P-value smaller than 0.05 were digested into the knowledge graph.BigGIM II - Drug response (gene expression based): To understand how different gene expressions are associated with different drug responses, we developed BigGIM II- Drug response KG (expression-based). Spearman correlations were calculated between the gene expression (RMA gene expression values) and drug response Area Under the Curve (AUC) for cell lines in different tumor types in the GDSC project. The correlations were calculated only if the number of cell lines with both the drug response data and gene expression value for more than 6 samples. For each tumor type, the correlations between gene (symbol), drug name(need to transform into drug ids), correlation, p-value, sample size, and tumor types are included. Resource data were downloaded from https://www.cancerrxgene.org/gdsc1000/GDSC1000_WebResources/Home.html. please cite the original paper if the result is used (Iorio et al., 2016, Cell 166, 740–754. PMID:27397505).BigGIM II - Gene Gene interaction (expression-based) KG: BigGIM II - Gene Gene interactions (expression-based) is an updated version for BigGIM I with updated datasets from tumor based co-expression or tissue-based gene expression. Update from the tumor-based co-expression: With the updated datasets from TCGA pancan study, we used the new version of gene expression value from the ISB-CGC PanCancer Atlas BigQuery Tables (pancancer-atlas.Filtered.EBpp_AdjustPANCAN_IlluminaHiSeq_RNASeqV2_genExp_filtered) to generate the graph (BigGIM II - Gene Gene interaction (expr-expr)). Gene co-expression correlations were computed using Pearson correlation. Gene expressions with observations in at least 25 samples were taken into consideration. Coefficient and p-value were derived from Pearson correlation analysis.BigGIM II - gene_associated_with_condition_Disease (Disease-Gene) It describes which genes are highly frequently mutated in different tumor types using the gene mutation data from TCGA-pancancer. We used TCGA data to quantify the gene mutation frequency at the patient level. Genes with mutation frequency greater than 5% and has mutated samples greater than 5 samples were selected. To further narrow down the genes, we further filtered the gene list according to the identification of cancer driver genes as published in PMID:29625053. Only driver genes were exposed to the MultiomicsBigGIM_DrugResponse_KP as of the version updated in Sep 2022.
提供机构:
figshare
创建时间:
2024-12-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作