Decoding and Rewiring Promoter Architecture Using Large Language Models and Diffusion Frameworks
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/SRP664679
下载链接
链接失效反馈官方服务:
资源简介:
High-performance promoters are essential tools for precisely regulating gene expres-sion, yet their rational design within the vast combinatorial sequence space remains a major challenge. Here, we present a hybrid framework that integrates a large lan-guage model (LLM) with a diffusion model to enable data-driven and interpretable promoter design. The fine-tuned LLM predicts promoter strength with high accuracy and, through pseudo-sequence mutations, identifies biologically essential core motifs. A diffusion model is then conditioned on these motifs to reconstruct non-core regions and generate complete promoter sequences. We experimentally validated this approach in E. coli by high-throughput barcoded promoter activity sequencing: over 90% of the generated promoters showed measurable activity, and the best variants achieved ap-proximately ~20-fold higher expression than the benchmark promoter (BBa_J23119). By explicitly coupling interpretability with generative design, this strategy provides a generalizable path to accelerate synthetic biology efforts and advance large-scale regu-latory sequence engineering. Overall design: This study employs a high-throughput barcoded sequencing assay to quantify the transcriptional activities of a synthetic promoter library in Escherichia coli DH5[alpha]. Each promoter was cloned into a plasmid reporter construct and uniquely associated with a barcode. The pooled plasmid library was introduced into E. coli DH5a and cultured in LB medium supplemented with 50 µg/mL kanamycin at 37 °C with shaking. At a defined time point, cells from the same pooled cultures were harvested for parallel extraction of total RNA and plasmid DNA. Barcode abundances in RNA (after reverse transcription) and DNA were determined by high-throughput sequencing. Promoter activity was quantified based on normalized RNA/DNA barcode ratios across biological replicates.
创建时间:
2026-01-23



