Perspectives on Codebook: sequence specificity of uncharacterized human transcription factors

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE275577

下载链接

链接失效反馈

官方服务：

资源简介：

We describe an effort (“Codebook”) to determine the sequence specificity of 332 putative and largely uncharacterized human transcription factors (TFs), as well as 61 control TFs. Nearly 5,000 independent experiments, including in vitro and in vivo assays, produced motifs for most of the uncharacterized TFs analyzed (180, or 53%), the vast majority of which are unique to a single TF. The data highlight the extensive contribution of transposable elements to TF evolution, both in cis and trans, and identify tens of thousands of conserved, base-level binding sites in the human genome. The use of multiple platforms provides an unprecedented opportunity to benchmark and analyze TF sequence specificity, function, and evolution, as further explored in accompanying manuscripts. Over 1,421 human TFs are now associated with a DNA binding motif. Extrapolation from the Codebook benchmarking suggests that many of the binding motifs for well-studied TFs may inaccurately describe the TF’s true sequence preferences. Protein binding microarray (PBM) experiments were performed for 173 diverse human DNA-binding proteins (462 experiments). Briefly, the PBMs involved binding GST-tagged DNA-binding proteins to two double-stranded 44K Agilent microarrays, each containing a different DeBruijn sequence design, in order to determine their sequence preferences. Details of the PBM protocol are described in Berger et al., Nature Biotechnology 2006.

创建时间：

2024-11-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集