Genome-wide discovery of active regulatory elements and transcription factor footprints in Caenorhabditis elegans using DNase-seq

NIAID Data Ecosystem2026-05-17 收录

下载链接：

https://www.ncbi.nlm.nih.gov/sra/SRP103095

下载链接

链接失效反馈

官方服务：

资源简介：

Deep sequencing of size-selected DNaseI-treated chromatin (DNase-seq) allows high resolution measurement of chromatin accessibility to DNaseI cleavage, permitting identification of de novo active cis regulatory modules (CRMs) and individual transcription factor (TF) binding sites. We adapted DNase-seq to nuclei isolated from C. elegans embryos and L1 arrest larvae to generate high-resolution maps of TF binding. Over half of embryonic DNaseI hypersensitive sites (DHS) were annotated in noncoding sequences, with 23% in intergenic, 11% promoter regions and 21% in introns, with similar statistics in data collected from L1 arrest larvae. Noncoding DHS exhibit high evolutionary sequence conservation and are enriched in marks of enhancer activity and transcription. We validated noncoding DHS against a previously investigated set of enhancers from myo-2, myo-3, hlh-1, elt-2 and lin-26/lir-1 gene loci and recapitulated 15 of 17 known enhancers in these loci. We then mined the DNase-seq data to identify putative active CRMs and TF footprints. Our DNase-seq data could also be used to improve predictions of tissue-specific expression compared to motifs alone. In a pilot functional test, 10 of 15 DHS from pha-4, icl-1 and ceh-13 drove reporter gene expression in transgenic C. elegans. Overall, we provide experimental annotation of 26,644 putative CRMs in the embryo containing 55,890 TF footprints, and 15,841 putative CRMs in the L1 arrest larvae containing 32,685 TF footprints. Overall design: Embryo and L1 arrest larvae nuclei were treated with 0, 20, 40, 80, 120, 160 U/mL DNaseI in 1X DNaseI digestion buffer (containing CaCl2, spermine, spermidine, protease inhibitor) each for 3 minutes at 37Â°C. DNaseI treatment follows the conditions from the Stamatoyannopoulos lab protocol (Thurman et al. 2012). DNaseI treatment was quenched with STOP buffer containing 20mg/mL Proteinase K and incubated 55Â°C overnight. After treating with 45ug/mL boiled RNase A for 30 minutes, DNA was purified and concentrated using column purification. The DNA sample was run on 1% agarose, stained with SYBR Gold, and the gel piece containing DNA fragments less than 500bp was isolated and purified. For each nuclei sample, we compared QPCR data using primers designed against positive controls (known enhancers from the lin-39/ceh-13 Hox complex studied in Kuntz et al. 2008) with negative controls (sequences from the same study that did not drive reporter gene expression) in order to measure the regulatory enrichment of CRMs in the DNase-treated nuclei samples. QPCR primers were designed against conserved MUSSA regions of âtrue positiveâ N1, N2, N3, N4, N7, N8, N9, N11 lin-39/ceh-13 enhancers and N5 and N6 negative control non-enhancer regions studied by Kuntz et al. (2008). QPCRs were performed with genomic DNA standards and absolute derivative measurement of Cp. Relative fold enrichment was compared within samples by normalizing measured concentration of each region vs. mean of negative controls (N5 and N6) (Figure S4). For each nuclear sample we chose the digested sample from the DNaseI treatment level that yielded the highest regulatory enrichment (typically between 2 and 5 fold) and used it to prepare libraries for sequencing. The libraries were multiplexed sequenced on Illumina HiSeq to yield 50bp single end reads. Raw DNaseI hypersensitive peaks were identified by detecting read enrichment in 150bp consecutive nucleotides using HOTSPOT peak caller specifically designed for DNase-seq (version 3; John et al. 2011). We filtered raw peak calls obtained from HOTSPOT using the irreproducibility discovery rate (IDR) framework developed for ENCODE, which uses a non-parametric copula mixture model to filter peaks into reproducible or irreproducible categories (Li et al. 2011; Landt et al. 2012). Peaks are filtered on the combination of their rank or score as well as their consistency across replicates at an IDR level of 0.1. Peaks overlapping Repeatmasker repeats were omitted. Blacklist regions from ENCODE representing known ce10 genomic regions exhibiting signal artifacts in ChIP-seq experiments were removed (ENCODE Project Consortium 2012). Overlapping peaks were merged to yield 41,825 and 23,670 DHS peaks across embryo and L1 arrest biological replicates, respectively. DHS peak locations were annotated in exons (if 75% of region was located in exon), introns, promoter (<300bp from ATG), and intergenic regions (>300bp from ATG) using custom scripts and WormBase WS241 gene models. Pseudogenes, tRNAs, and ncRNAs were excluded from annotation. Footprints were identified with DNase2TF software package using a FDR threshold of 0.05 (Sung et al. 2014) to identify decreased read coverage within noncoding DHS in regions between 6-40bp with a strand shift in reads in each biological replicate. Replicate data within each stage were merged and used to identify additional TF footprints.

创建时间：

2018-01-13