five

Estimating and Accounting for Unobserved Covariates in High-Dimensional Correlated Data

收藏
DataCite Commons2020-08-25 更新2024-08-18 收录
下载链接:
https://tandf.figshare.com/articles/Estimating_and_accounting_for_unobserved_covariates_in_high_dimensional_correlated_data/12327986/2
下载链接
链接失效反馈
官方服务:
资源简介:
Many high-dimensional and high-throughput biological datasets have complex sample correlation structures, which include longitudinal and multiple tissue data, as well as data with multiple treatment conditions or related individuals. These data, as well as nearly all high-throughput “omic” data, are influenced by technical and biological factors unknown to the researcher, which, if unaccounted for, can severely obfuscate estimation of and inference on the effects of interest. We therefore developed CBCV and CorrConf: provably accurate and computationally efficient methods to choose the number of and estimate latent confounding factors present in high-dimensional data with correlated or nonexchangeable residuals. We demonstrate each method’s superior performance compared to other state of the art methods by analyzing simulated multi-tissue gene expression data and identifying sex-associated DNA methylation sites in a real, longitudinal twin study. Supplementary materials for this article are available online.

诸多高维高通量生物数据集具备复杂的样本关联结构,涵盖纵向数据、多组织数据,以及包含多处理条件或存在亲缘个体的数据。此类数据以及几乎所有高通量组学(omic)数据,均会受到研究者未知的技术与生物学因素影响;若未对这些因素加以校正,则会严重干扰目标效应的估计与推断。为此,我们开发了CBCV与CorrConf两种方法:二者可证明具备准确性且计算效率优异,能够针对残差存在关联或不可交换的高维数据,选择潜在混杂因子的数量并对其进行估计。我们通过分析模拟多组织基因表达数据,并在一项真实的纵向双胞胎研究中鉴定与性别相关的DNA甲基化位点,证明了本方法相较于其他当前主流方法的优异性能。本文的补充材料可在线获取。
提供机构:
Taylor & Francis
创建时间:
2020-08-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作