A Feature-Based Approach to Modeling Protein–DNA Interactions
收藏NIAID Data Ecosystem2026-03-06 收录
下载链接:
https://figshare.com/articles/dataset/A_Feature_Based_Approach_to_Modeling_Protein_DNA_Interactions/150296
下载链接
链接失效反馈官方服务:
资源简介:
Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. However, in many cases, this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF–DNA interactions, based on log-linear models. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our model and devise an algorithm for learning its structural features from binding site data. We also developed a discriminative motif finder, which discovers de novo FMMs that are enriched in target sets of sequences compared to background sets. We evaluate our approach on synthetic data and on the widely used TF chromatin immunoprecipitation (ChIP) dataset of Harbison et al. We then apply our algorithm to high-throughput TF ChIP data from mouse and human, reveal sequence features that are present in the binding specificities of mouse and human TFs, and show that FMMs explain TF binding significantly better than PSSMs. Our FMM learning and motif finder software are available at http://genie.weizmann.ac.il/.
转录因子(Transcription factor, TF)与其DNA靶位点的结合是一类核心的调控相互作用。当前用于表征TF结合特异性的最常用模型为位置特异性得分矩阵(position specific scoring matrix, PSSM),该模型假设结合位点间相互独立。然而在诸多场景中,这一简化假设并不成立。本文提出特征基序模型(feature motif models, FMMs)——一种基于对数线性模型的新型TF-DNA相互作用建模概率方法。我们的方法利用序列特征来表征TF结合特异性,其中每个特征可跨越多个结合位点。本文推导了该模型的数学形式化表达,并设计了一种从结合位点数据中学习其结构特征的算法。此外,我们还开发了一款判别式基序发现工具,可用于挖掘相较于背景序列集在靶序列集中显著富集的从头FMMs。我们在合成数据以及Harbison等人广泛使用的TF染色质免疫沉淀(chromatin immunoprecipitation, ChIP)数据集上对所提方法进行了评估。随后,我们将算法应用于小鼠和人类的高通量TF ChIP数据,揭示了小鼠与人类TF结合特异性中存在的序列特征,并证实FMMs对TF结合的解释能力显著优于PSSMs。本研究中的FMM学习与基序发现工具软件可从http://genie.weizmann.ac.il/获取。
创建时间:
2008-08-22



