DoubleChEC program to identify transcription factor binding sites from mapped ChEC-seq data
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.c866t1gd5
下载链接
链接失效反馈官方服务:
资源简介:
ChIP-seq (chromatin immunoprecipitation followed by sequencing) is commonly used to identify genome-wide protein-DNA interactions. However, ChIP-seq often gives a low yield, which is not ideal for quantitative outcomes. An alternative method to ChIP-seq is ChEC-seq (Chromatin endogenous cleavage with high-throughput sequencing). In this method, the endogenous TF (transcription factor) of interest is fused with MNase (micrococcal nuclease) that non-specifically cleaves DNA near binding sites. Compared to the original ChEC-seq method, the modified version requires far less amplification. Since MACS3 failed to identify peaks in data generated from the modified ChEC-seq method, a new peak finder has been developed specifically for it.
There are three functions in the peak_finder/. callpeaks() is used to identify peaks from BAM files. goanalysis() is used to make GO (Gene Ontology) term plots from peaks. bedtomeme() is a wrapper function to perform MEME analysis in R after MEME Suite is installed locally.
Methods
****EXCERPTED FROM BIORXIV PREPRINT; SEE PREPRINT OR PUBLISHED PAPER FOR REFERENCES AND DETAILS****
Yeast strains
All yeast strains were derived from BY4741. A C-terminal micrococcal nuclease fusion was introduced to the protein of interest through transformation and homologous recombination of PCR-amplified DNA. Primers were designed with 50-bp of homology to the 3’ end of the coding sequence of interest. The 3xFLAG-MNase with a KanR marker was amplified from pGZ108 (Zentner et al., 2015) and transformed into BY4741 as previously described. Successful transformation was confirmed by immunoblotting and PCR, followed by sequencing.
Lyophilized DNA oligonucleotides were resuspended in molecular-grade water to a concentration of 100 µM. For ligation, the following pair of oligonucleotides were annealed to produce the Y-adapter: Tn5ME-A (5’-TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG-3’) and Y-Adapt-i5 R (5’-CTGTCTCTTATACACATCTTCATAGTAATCATC-3’). For Tn5 Tagmentation, the following i7 oligonucleotides were annealed: Tn5ME-B (5’ -GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG-3’) and Tn5MErev, (5’-PO4-CTGTCTCTTATACACATCT-3’). Pairs of oligonucleotides were annealed as follows: 45 µl of each oligo (100 µM) was combined with 10 µl of 1 M Potassium Acetate, 300 mM HEPES, pH 7.5 in a 0.2 ml PCR tube. In a thermocycler, the mixture was heated to 95˚C for 4 minutes, cooled 1°C/minute until 50°C, incubated at 50°C for 5 minutes, and then cooled 1°C/minute until 4°C. Hybridized oligos were stored in 15 µl aliquots at -20˚C.
Tn5 purification and adapter loading
Tn5 E54K L372P was purified as previously described (Hennig et al., 2017). We found that Tn5 was sufficiently pure following purification on Ni2+-chromatography and we therefore omitted the final gel filtration step. Purified Tn5 was aliquoted and stored at -80°C. Optimal Tn5 activity was determined by cleaving genomic DNA and assessing fragmentation using the Femto Pulse (Figure S2d), and resulting DNA libraries were confirmed to be of appropriate length for Illumina Sequencing by TapeStation (Figure S2e).
Tn5 was thawed on ice and 100 µl Tn5 was added to 10 µl i7 (45 µM) in a 1.7 ml tube and mixed by gently pipetting. The mixture was incubated at 23°C, mixing at 350 rpm for 45 minutes. Adapter-loaded Tn5 was stored at -20°C and used within 24 hours.
*************************************************************
Chromatin endogenous cleavage detailed protocol
Chromatin digestion
1. Grow cells in 10ml overnight at 30°C, 200 rpm.
2. Dilute cultures into 50ml media to OD600 ~ 0.1.
3. When cultures reach OD600 = 0.5 - 0.8, harvest 25 ODs (i.e. 50ml if the OD600 = 0.5) of cells by centrifugation at 2500 x g for 1 minute.
4. Resuspend cells in 1 ml Buffer A and transfer to a 1.5 ml tube.
5. Pellet cells by centrifugation at 2500 x g for 1 minute, remove supernatant.
6. Wash cells 2 x 1 ml Buffer A, removing supernatant.
7. Resuspend cells in 600 µl Buffer A + 0.1% Digitonin.
8. Transfer tube to a 30°C heat block and incubate for 5 minutes.
9. Add 5 µl of 333 mM CaCl2, mix by inverting, incubate at 30°C for the appropriate cleavge time (determined empirically for each protein).
10. To stop the reaction, remove 200 µl cells and combine with 200 µl 2x Stop Buffer.
11. Add 8 µl Proteinase K (20 µg/µl) and mix.
12. Incubate at 50°C, agitating 800 rpm for 30 minutes in a thermomixer.
13. Remove samples from thermomixer and cool at room temperature for 5 minutes.
14. Add 400 µl Phenol-Chloroform-Isoamyl Alcohol (25:24:1), pH 7.8, mix.
15. Centrifuge at 24,000 x g, 5 minutes.
16. Transfer aqueous phase to a phase-lock tube.
17. Add 200 µl Phenol-Chloroform-Isoamyl Alcohol (25:24:1), pH 7.8.
18. Invert 10x to mix.
19. Centrifuge at 24,000 x g for 5 minutes.
20. Transfer aqueous phase to a tube containing 1 ml 100% Ethanol.
21. Add 2 µl of linear acrylamide (5 µg/µl).
22. Invert 10x to mix.
23. Incubate at -80°C for 30 minutes.
24. Centrifuge at 24,000 x g, 4°C for 10 minutes.
25. Pour off supernatant.
26. Wash DNA pellet in 1 ml of 70% ethanol.
27. Centrifuge at 24,000 x g for 1 minute.
28. Pour off supernatant. Collect residual ethanol by centrifugation and remove by pipetting.
29. Dry DNA pellet until all ethanol had evaporated.
30. Add 58 µl of 10 mM Tris-HCl, pH 8.5 to DNA pellet.
31. Incubate overnight at room temperature.
32. Incubate at 37°C for 30 minutes.
33. Add 2 µl RNase A (10 µg/µl) to DNA.
34. Incubate at 37°C for 30 minutes.
35. Evaluate molecular weight of DNA by gel electrophoresis, 0.8% agarose or TapeStation.
36. Quantify DNA concentration with the Qubit double-stranded DNA, Broad Range Assay.
37. Stored DNA at 4°C until library preparation was performed (up to a month), then stored at -20°C.
Buffer A
15 mM Tris-HCl, pH 7.5
80 mM KCl
0.1 mM EGTA
1.0 mM PMSF
0.5 mM Spermidine
0.2 mM Spermine
-Add 1 EDTA-Free Protease Inhibitor Tab per 50 ml Buffer A (Roche; Sigma # 11873580001)
2x Stop Buffer
400 mM NaCl
20 mM EDTA
4 mM EGTA
******************************************
Bioinformatic analysis
Quality Control, Trimming, and Mapping
Read quality and sequencer performance was evaluated with FASTQC. Reads were adapter and quality trimmed with Trimmomatic (Bolger et al., 2014) using single-end settings. Bases at either end of a read were trimmed if base-call quality was less than 30, and only reads of length ≥25 bp were retained. Trimmed reads were mapped to the Saccharomyces cerevisiae genome (Engel et al., 2013), version R64-4-1 with Bowtie2 (Langmead and Salzberg, 2012)and mapped reads with a MAPQ <10 were removed with Samtools (Li et al., 2009).
DoubleChEC identification of high-confidence TF binding sites
For peak calling analysis, BAM files for three or more biological replicates of the TF-MNase and soluble MNase were read and trimmed to the first base pair. Unnormalized counts and normalized counts per million (CPMn) were tallied for each base pair in the yeast genome and the average CPMn values among replicates were calculated for each position. Next, mean CPMn values were smoothed using a sliding window of 3 and a step size of 2. Windows with CPMn values less than three times the genome average were filtered out. After this filtering, local maxima (windows with values greater than their immediate neighbors) were identified. Unnormalized reads were smoothed, retaining positions that were identified as local maxima, and inputted them in DESeq2 (version 1.36.0) to identify windows with values significantly higher than those in the soluble MNase control. Only TF-MNase peaks with a greater log2-fold change of 1.7 and an adjusted p-value less than 0.0001 over soluble MNase were retained. Finally, the peaks were filtered again to identify doublet peaks that are between 15bp and 50bp apart, which were merged to single peaks.
GO term plot
A list of genes whose 700bp upstream regions overlap with peaks identified by the peak finder was input to enrichGO (Wu et al., 2021) to generate GO term plots based on biological functions. The 10 most significant GO terms with adjusted p-values less than 0.05 were plotted.
MEME analyses
The MEME Suite (version 5.5.1) was installed onto the local computer and two custom wrapper functions were written in R for the local bed2fasta and meme programs. These functions were then used to convert bed files, generated from peak calling, into FASTA files. These FASTA files were subsequently to generate motif logos. Both bed2fasta and meme programs were run using their default parameter values.
创建时间:
2023-12-22



