DNA-Diffusion STARR-Seq Validation

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://www.ncbi.nlm.nih.gov/sra/SRP577018

下载链接

链接失效反馈

官方服务：

资源简介：

To test how DNA-Diffusion sequences can induce transcription, we select 2150 sequences, including DNA-Diffusion synthetic and natural occurring DHS sites for each cell type (K562, HepG2, and GM12878) and insert them into STARR-Seq plasmids (N= 6450 sequences). all synthetic and naturally occurring sequences were combined into a single library, and this same library was experimentally tested using STARR-Seq in different cell lines (K562, HepG2, GM12878). Overall design: A STARR-Seq plasmid library containing 6,000 enhancer candidates was synthesized and constructed following methods described previously (Neumayr et al., Gordon et al.). Each candidate was uniquely barcoded with five random nucleotides during synthesis, PCR amplified using specific primers (forward: TAGAGCATGCACCGG, reverse: TCGACGAATTCGGCC), and cloned into the STARR-Seq plasmid vector (hSTARR-seq_ORI, Addgene #99296) via NEBuilder HiFi assembly. Recombinant plasmids were electroporated into NEBÂ® 10-beta electrocompetent E. coli and amplified plasmid libraries were purified using PureLink HiPure plasmid midiprep kits. Library complexity was assessed by sequencing PCR-amplified inserts prepared with NEBNext Ultra II DNA kits. Subsequently, plasmids of sufficient complexity were electroporated into three cell lines (K562, HepG2, GM12878), and post-transfection RNA and DNA were isolated after 6 hours. Following DNase treatment and reverse transcription of isolated RNA, both RNA-derived cDNA and genomic DNA underwent targeted amplification with specific primer pairs and were sequenced using Illumina-compatible NEBNext library preparations. Counts of enhancer sequences were quantified using Kallisto, aligning reads to a constructed index with k-mer size 23, filtering for the presence of specific homology arms. The final enhancer activity (mRNA/DNA ratio) was computed with the R "mpra" package (version 1.14.0), employing voom normalization and limma analysis to generate log2 fold changes (log2FC) and associated error ranges across all cell line replicates, considering only sequences with non-zero DNA counts.

创建时间：

2025-10-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集