Identifying Gene Set Association Enrichment Using the Coefficient of Intrinsic Dependence
收藏NIAID Data Ecosystem2026-03-07 收录
下载链接:
https://figshare.com/articles/dataset/Identifying_Gene_Set_Association_Enrichment_Using_the_Coefficient_of_Intrinsic_Dependence__/652456
下载链接
链接失效反馈官方服务:
资源简介:
Gene set testing problem has become the focus of microarray data analysis. A gene set is a group of genes that are defined by a priori biological knowledge. Several statistical methods have been proposed to determine whether functional gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to analyzing the dependence structure among gene sets. In this study, we have proposed a novel statistical method of gene set association analysis to identify significantly associated gene sets using the coefficient of intrinsic dependence. The simulation studies show that the proposed method outperforms the conventional methods to detect general forms of association in terms of control of type I error and power. The correlation of intrinsic dependence has been applied to a breast cancer microarray dataset to quantify the un-supervised relationship between two sets of genes in the tumor and non-tumor samples. It was observed that the existence of gene-set association differed across various clinical cohorts. In addition, a supervised learning was employed to illustrate how gene sets, in signaling transduction pathways or subnetworks regulated by a set of transcription factors, can be discovered using microarray data. In conclusion, the coefficient of intrinsic dependence provides a powerful tool for detecting general types of association. Hence, it can be useful to associate gene sets using microarray expression data. Through connecting relevant gene sets, our approach has the potential to reveal underlying associations by drawing a statistically relevant network in a given population, and it can also be used to complement the conventional gene set analysis.
基因集检验问题已成为微阵列数据分析的核心研究方向。基因集是一组基于先验生物学知识定义的基因集合。目前已有多种统计方法被提出,用于判定功能基因集在表型变异中是否发生差异表达(富集和/或缺失)。然而,现有研究对基因集间的依赖结构分析关注度不足。本研究提出一种全新的基因集关联分析统计方法,利用内在依赖系数(coefficient of intrinsic dependence)识别具有显著关联的基因集。模拟研究结果显示,在控制一类错误与检验效能的前提下,所提方法在检测通用型关联方面优于传统方法。研究团队将内在依赖相关性(correlation of intrinsic dependence)应用于乳腺癌微阵列数据集,以量化肿瘤与非肿瘤样本中两组基因间的无监督关联关系。结果发现,基因集关联的存在性在不同临床队列中存在显著差异。此外,本研究采用监督学习方法,演示了如何利用微阵列数据发掘信号转导通路或受一组转录因子调控的子网络中的基因集。综上,内在依赖相关性为检测通用类型的基因关联提供了强有力的分析工具,可有效基于微阵列表达数据实现基因集关联分析。通过关联相关基因集,本方法能够在特定人群中构建具有统计学意义的关联网络,从而揭示潜在的生物学关联机制,同时也可作为传统基因集分析方法的有效补充。
创建时间:
2016-01-18



