Dataset of AlphaFold 3 and RoseTTAFold2NA Structures for Analysing the Capabilities of These Programmes in Predicting the Spacing and Orientation Preferences of Two Transcription Factors

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/14844636

下载链接

链接失效反馈

官方服务：

资源简介：

Description:This dataset contains a collection of predicted TF‑TF–DNA complex structures generated using two state‑of‑the‑art artificial intelligence programmes: RoseTTAFold2NA and AlphaFold 3. The structures were predicted as part of a study aimed at evaluating the capability of these models to capture the spacing and orientation preferences in TF‑TF–DNA interactions, as derived from CAP‑SELEX experimental data. This repository is accompanied by an associated Zenodo repository [https://doi.org/10.5281/zenodo.14846538] containing the code used for analysing these data. The code repository was automatically copied by Zenodo from this GitHub repository Background and Rationale:Transcription factors bind to specific DNA sequences to regulate gene expression. Experimental approaches such as CAP‑SELEX have enabled the characterisation of binding preferences by providing enriched k‑mer motifs and information on preferred spatial arrangements between TF binding sites. In this study, 119 position weight matrix (PWM) models, obtained from CAP‑SELEX experiments, were used to define the preferred DNA sequences for pairs of transcription factors (e.g. PAX2 and ELK3). Dataset Content:A total of 1,620 PDB files were generated for each program, resulting in 3,240 predicted structures overall. The sequences used as input for AlphaFold 3 were identical to those used for RoseTTAFold2NA. For each PWM model, multiple competitive DNA sequences were generated to probe different spatial and orientational configurations. Methodology:The prediction pipeline involved the following key steps: Derivation of Input Sequences: Based on CAP‑SELEX data, two key k‑mers were identified for each TF. Multiple DNA sequences were constructed by varying the spacing (e.g. ±1 or ±2 nucleotides) and the orientation (e.g. swapping positions or using reverse complements) between the two k‑mers. Structure Prediction: The generated sequences were used as input for RoseTTAFold2NA and AlphaFold 3, resulting in a set of predicted TF‑TF–DNA complex structures. Analysis: The predicted structures were analysed by comparing the number of contacts in the region corresponding to the preferred DNA versus the competitive DNA. A contact was defined as a pair of atoms (one from an amino acid and one from a nucleic acid) whose minimum distance is less than the threshold of 0.45 nm. By linking experimental CAP‑SELEX data with high‑accuracy structural predictions, this resource facilitates a deeper understanding of the molecular mechanisms underlying transcriptional regulation and provides a valuable resource for improving existing methods as well as serving as a dataset for further methodological development.

创建时间：

2025-02-10