Sensitivity of genes, molecular pathways and disease related categories to chemical exposures
收藏DataCite Commons2025-04-01 更新2025-04-16 收录
下载链接:
https://data.mendeley.com/datasets/65fcympd2j
下载链接
链接失效反馈官方服务:
资源简介:
The goal of this project is to identify molecular mechanisms sensitive to chemical exposures in an unbiased way. Results of this project are published on preprints.org (doi: 10.20944/preprints202006.0261.v1).
The data-files described below represent major steps of our analysis:
1. Annotated chemical-gene interactions.xlsx
The data on chemical-gene interactions obtained from high-throughput toxicological genomic experiments with human, mouse, or rat cells and tissues was extracted from Comparative Toxicogenomic Database (CTD, http://ctdbase.org/) on 08.24.2018. Genes not present in genomes of all three species were filtered out. Chemical compounds were annotated for major uses with information from Wikipedia, PubChem, and PubMed. Based on textual annotation every compound was assigned one to three annotation terms out of the following list: pharmaceutical, recreational drug, research, warfare, endobiotic, agricultural, cosmetics, environment, food components, industrial, and pollutant. All contributors annotated an equal numbers of chemicals, and AS checked every annotation to insure consistency of approaches. The resulting dataset includes 591,084 individual chemical-gene interactions.
2. Number of chemical-gene interactions per gene.xlsx
The dataset created at the previous step was used to determine number of chemical-gene annotations for every gene, including total number as well as number of activating and suppressive chemical-gene annotations. We hypothesize, that number of chemical gene interactions can be used as a measure of the gene sensitivity to chemical exposures.
3. Enrichment of molecular pathways with genes sensitive to chemical exposures.xlsx
The list of genes with the total number of chemical-gene interactions for every gene was used as an input for the Gene-Set Enrichment Analysis (GSEA, https://www.gsea-msigdb.org/gsea/index.jsp) against Hallmark, KEGG, and Reactome datasets, to identify molecular pathways highly enriched with genes sensitive to chemical exposures. We suggest, that normalized enrichment score (NES) for every enriched pathway is a measure of the pathway's sensitivity to chemical exposures.
4. Diseases vs. chemically sensitive KEGG pathways matrix.xlsx and
5. Diseases vs. chemically sensitive Reactome pathways matrix.xlsx
To identify disease categories that are sensitive to chemical exposures, the lists of significantly enriched KEGG and Reactome pathways (false discovery rate (FDR) q > 0.01 and normalized enrichment score (NES) > 1.9) were submitted to the CTD to run pathway-disease association analysis. This analysis resulted in matrices of shared gene numbers between chemically sensitive pathways and disease states. Two numeric values indicate sensitivity of disease states to chemical exposures: the number of inferred pathways associated with the disease state and the sum of genes from every pathway overlapping with the disease (number of inference genes).
本项目的目标是以无偏倚的方式识别对化学暴露敏感的分子机制。本项目的研究成果已发布于preprints.org(doi: 10.20944/preprints202006.0261.v1)。
下文详述的各数据文件对应本分析的主要步骤:
1. 已注释化学-基因相互作用.xlsx
该数据集包含从人类、小鼠或大鼠细胞与组织的高通量毒理基因组实验中获取的化学-基因相互作用数据,于2018年8月24日从比较毒理基因组数据库(Comparative Toxicogenomic Database, CTD,http://ctdbase.org/)提取。我们过滤掉了不存在于上述三个物种基因组中的基因。基于维基百科、PubChem及PubMed的信息,对化学物的主要用途进行注释,并为每种化合物从以下列表中分配1至3个注释术语:药物、娱乐性毒品、研究用化学品、战争用化学品、内源性物质、农业用化学品、化妆品、环境相关化学品、食品成分、工业用化学品及污染物。所有参与者注释的化学物数量均等,且由AS对所有注释进行核查以确保方法的一致性。最终数据集包含591084条独立的化学-基因相互作用记录。
2. 单基因化学-基因相互作用数量.xlsx
使用上一步生成的数据集,计算每个基因的化学-基因注释数量,包括总数量、激活型与抑制型化学-基因注释的数量。我们提出假设:化学-基因相互作用的数量可作为基因对化学暴露敏感性的衡量指标。
3. 化学暴露敏感基因的分子通路富集分析.xlsx
将包含每个基因总化学-基因相互作用数量的基因列表作为输入,针对Hallmark、KEGG及Reactome数据集开展基因集富集分析(Gene-Set Enrichment Analysis, GSEA,https://www.gsea-msigdb.org/gsea/index.jsp),以识别显著富集化学暴露敏感基因的分子通路。我们认为,每条富集通路的标准化富集得分(normalized enrichment score, NES)可作为该通路对化学暴露敏感性的衡量指标。
4. 疾病与化学敏感KEGG通路矩阵.xlsx 及
5. 疾病与化学敏感Reactome通路矩阵.xlsx
为识别对化学暴露敏感的疾病类别,我们将显著富集的KEGG和Reactome通路列表(假发现率(false discovery rate, FDR)q>0.01且标准化富集得分(NES)>1.9)提交至CTD,开展通路-疾病关联分析。该分析生成了化学敏感通路与疾病状态间共享基因数量的矩阵。有两个数值可用于表征疾病状态对化学暴露的敏感性:与该疾病状态相关的推断通路数量,以及与该疾病存在基因重叠的所有通路的基因总数(即推断基因数)。
提供机构:
Mendeley Data
创建时间:
2020-07-09



