File S1 - A General Pairwise Interaction Model Provides an Accurate Description of In Vivo Transcription Factor Binding Sites

Figshare2015-12-02 更新2026-04-29 收录

下载链接：

https://figshare.com/articles/dataset/A_General_Pairwise_Interaction_Model_Provides_an_Accurate_Description_of_In_Vivo_Transcription_Factor_Binding_Sites/1057467

下载链接

链接失效反馈

官方服务：

资源简介：

Supporting figures and tables. Figure S1. Dependence of the fit on the number of ChIP sequences. For each TF, the number of available ChIP sequences is plotted vs. the improvement in the description of its TFBS statistics, provided by the he PIM as compared to the PWM model. The latter is quantified by the ratio of DKL between the respective model probability distributions and the experimental ones provided by the ChIP data, . The improvement afforded by the PIM is clearly seen to be correlated to the number of ChIP sequences available.TFs for which the PWM description appears satisfactory (see Figure 2 of the main text) are shown in blue. Figure S2. Comparison of the different methods to define the basins of attraction. We compare two methods that allow to define the basins of attraction of the PIM model. Given an initial sequence, the attractor is found by changing iteratively either the nucleotide providing the strongest decrease in energy (deterministic method) or a random nucleotide providing a strict decrease of energy (random method). We show for the factors studied in the main text the proportion of sites falling in each of the basins of attraction using the deterministic method or trials of the random method. For these factors we observed that the number of basins of attraction was not changing, and that the proportion of sites falling in each basin was well conserved. Figure S3. Same as Figure 6 of the main text for all considered factors described by a mixture model with two or more PWMs. Figure S4. Same as Figure 7A of the main text for the other considered factors. Figure S5. Background correlations. (A,B,C) Heat maps showing the correlations between nucleotides in the ChIP data of the factors from the main text. Because of translation invariance, we only show the correlations between a nucleotide (rows) and the next nearest (first four columns) to farthest (last four columns) nucleotides, using the binding site length of . We see in the Drosophila data the appreciable presence of repeated sequences (of type AA, TT, CC, and GG). In the mammalian data sets, we observe the known CpG depletion. (A′,B′,C′) Corresponding heat maps showing the values of the Normalized Direct Information between pairs of nucleotides. Figure S6. Variable spacer length. We learned a PIM for Esrrb including the flanking nucleotides on the left of the main motif. (A) The metastable states of this model show a feature not captured in the main text where binding sites are defined symmetrically around the center of mass of the information content: namely a ‘CAG’ trinucleotide with variable spacer length from the main motif. This feature is apparent in the first logos shown here. (B) The contribution of this trinucleotidic interaction to the Direct Information is captured through strong direct links between the flanking nucleotides, showing that the PIM is implicitly able to capture higher order correlations. Logos from the PWM model are surrounding the heatmap for clarity. Table S1. Comparison between initial PWMs and PWMs. Bottom rows correspond to the 6 factors that are satisfactorily described by the PWM model. Information content is in bits. (PDF)

创建时间：

2015-12-02