five

Decoding and Rewiring Promoter Architecture Using Large Language Models and Diffusion Frameworks

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/SRP664679
下载链接
链接失效反馈
官方服务:
资源简介:
High-performance promoters are essential tools for precisely regulating gene expres-sion, yet their rational design within the vast combinatorial sequence space remains a major challenge. Here, we present a hybrid framework that integrates a large lan-guage model (LLM) with a diffusion model to enable data-driven and interpretable promoter design. The fine-tuned LLM predicts promoter strength with high accuracy and, through pseudo-sequence mutations, identifies biologically essential core motifs. A diffusion model is then conditioned on these motifs to reconstruct non-core regions and generate complete promoter sequences. We experimentally validated this approach in E. coli by high-throughput barcoded promoter activity sequencing: over 90% of the generated promoters showed measurable activity, and the best variants achieved ap-proximately ~20-fold higher expression than the benchmark promoter (BBa_J23119). By explicitly coupling interpretability with generative design, this strategy provides a generalizable path to accelerate synthetic biology efforts and advance large-scale regu-latory sequence engineering. Overall design: This study employs a high-throughput barcoded sequencing assay to quantify the transcriptional activities of a synthetic promoter library in Escherichia coli DH5[alpha]. Each promoter was cloned into a plasmid reporter construct and uniquely associated with a barcode. The pooled plasmid library was introduced into E. coli DH5a and cultured in LB medium supplemented with 50 µg/mL kanamycin at 37 °C with shaking. At a defined time point, cells from the same pooled cultures were harvested for parallel extraction of total RNA and plasmid DNA. Barcode abundances in RNA (after reverse transcription) and DNA were determined by high-throughput sequencing. Promoter activity was quantified based on normalized RNA/DNA barcode ratios across biological replicates.
创建时间:
2026-01-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作