Table_2_Bivariate Causal Discovery and Its Applications to Gene Expression and Imaging Data Analysis.DOCX
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/Table_2_Bivariate_Causal_Discovery_and_Its_Applications_to_Gene_Expression_and_Imaging_Data_Analysis_DOCX/7034177
下载链接
链接失效反馈官方服务:
资源简介:
The mainstream of research in genetics, epigenetics, and imaging data analysis focuses on statistical association or exploring statistical dependence between variables. Despite their significant progresses in genetic research, understanding the etiology and mechanism of complex phenotypes remains elusive. Using association analysis as a major analytical platform for the complex data analysis is a key issue that hampers the theoretic development of genomic science and its application in practice. Causal inference is an essential component for the discovery of mechanical relationships among complex phenotypes. Many researchers suggest making the transition from association to causation. Despite its fundamental role in science, engineering, and biomedicine, the traditional methods for causal inference require at least three variables. However, quantitative genetic analysis such as QTL, eQTL, mQTL, and genomic-imaging data analysis requires exploring the causal relationships between two variables. This paper will focus on bivariate causal discovery with continuous variables. We will introduce independence of cause and mechanism (ICM) as a basic principle for causal inference, algorithmic information theory and additive noise model (ANM) as major tools for bivariate causal discovery. Large-scale simulations will be performed to evaluate the feasibility of the ANM for bivariate causal discovery. To further evaluate their performance for causal inference, the ANM will be applied to the construction of gene regulatory networks. Also, the ANM will be applied to trait-imaging data analysis to illustrate three scenarios: presence of both causation and association, presence of association while absence of causation, and presence of causation, while lack of association between two variables. Telling cause from effect between two continuous variables from observational data is one of the fundamental and challenging problems in omics and imaging data analysis. Our preliminary simulations and real data analysis will show that the ANMs will be one of choice for bivariate causal discovery in genomic and imaging data analysis.
遗传学、表观遗传学与影像学数据分析领域的主流研究方向,多聚焦于变量间的统计关联或统计依赖性探索。尽管该领域在遗传学研究中已取得显著进展,但解析复杂表型的病因与作用机制仍颇具难度。将关联分析作为复杂数据分析的核心分析框架,是阻碍基因组学理论发展及其实际应用的关键瓶颈。因果推断是挖掘复杂表型间因果机制关系的核心组成部分,诸多研究者呼吁推动从关联分析向因果推断的转型。尽管因果推断在科学、工程与生物医学领域具备核心基础地位,但传统因果推断方法至少需要三个变量方可实施。然而,诸如数量性状基因座(QTL)、表达数量性状基因座(eQTL)、甲基化数量性状基因座(mQTL)在内的定量遗传学分析,以及基因组影像学数据分析,均需探索两个变量间的因果关系。本文将聚焦于连续变量场景下的双变量因果发现研究,引入原因与机制独立性(ICM, Independence of Cause and Mechanism)作为因果推断的基本原理,并以算法信息论与加性噪声模型(ANM, additive noise model)作为双变量因果发现的核心工具。我们将开展大规模模拟实验,以评估ANM应用于双变量因果发现的可行性;为进一步评估ANM的因果推断性能,还将其应用于基因调控网络的构建。此外,我们还将ANM应用于性状-影像学数据分析,以阐明三类典型场景:同时存在因果关联与统计关联的场景、仅存在统计关联而无因果关联的场景,以及存在因果关联但两变量间无统计关联的场景。从观测数据中判别两个连续变量间的因果方向,是组学与影像学数据分析领域兼具基础性与挑战性的核心问题之一。我们的初步模拟与真实数据分析结果将表明,ANM将成为基因组学与影像学数据分析中双变量因果发现的可选方法之一。
创建时间:
2018-08-31



