Replication data for: Coupling of Hidden Markov Models for the Discovery of Cis-Regulatory Modules in Multiple Species
收藏DataONE2015-04-11 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/sha256:2900ac543ee6f028d565c799489a0f9323fd93d7cc009da30c6af5ded30eccd1
下载链接
链接失效反馈官方服务:
资源简介:
Cis-regulatory modules (CRMs) composed of multiple transcription factor binding sites (TFBSs) control gene expression in eukaryotic genomes. Comparative genomic studies have shown that these regulatory elements are more conserved across species due to evolutionary constraints. We propose a statistical method to combine module structure and cross-species orthology in de novo motif discovery. We use a hidden Markov model (HMM) to capture the module structure in each species and couple these HMMs through multiple-species alignment. Evolutionary models are incorporated to consider correlated structures among aligned sequence positions across different species. Based on our model, we develop a Markov chain Monte Carlo approach, MultiModule, to discover CRMs and their component motifs simultaneously in groups of orthologous sequences from multiple species. Our method is tested on both simulated and biological data sets in mammals and Drosophila, where significant improvement over other motif and module discovery methods is observed.
顺式调控模块(cis-regulatory modules, CRMs)由多个转录因子结合位点(transcription factor binding sites, TFBSs)组成,在真核生物基因组中调控基因的表达。比较基因组学研究显示,受进化约束的影响,这类调控元件在不同物种间具有更高的保守性。我们提出了一种整合模块结构与跨物种同源性的从头基序发现统计方法。该方法采用隐马尔可夫模型(hidden Markov model, HMM)捕获单个物种内的模块结构,并通过多物种比对实现各HMM间的耦合;同时引入进化模型,以考量不同物种间比对序列位点之间的相关结构。基于此模型,我们开发了马尔可夫链蒙特卡洛(Markov chain Monte Carlo, MCMC)方法MultiModule,可同时从多物种同源序列组中挖掘顺式调控模块及其组成基序。我们在哺乳动物和果蝇的模拟数据与生物学数据集上对该方法进行了测试,结果显示其性能显著优于其他基序与模块发现方法。
创建时间:
2023-11-21



