A Computational Framework for Identifying Promoter Sequences in Nonmodel Organisms Using RNA-seq Data Sets
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/A_Computational_Framework_for_Identifying_Promoter_Sequences_in_Nonmodel_Organisms_Using_RNA-seq_Data_Sets/14599280
下载链接
链接失效反馈官方服务:
资源简介:
Engineering
microorganisms into biological factories that convert
renewable feedstocks into valuable materials is a major goal of synthetic
biology; however, for many nonmodel organisms, we do not yet have
the genetic tools, such as suites of strong promoters, necessary to
effectively engineer them. In this work, we developed a computational
framework that can leverage standard RNA-seq data sets to identify
sets of constitutive, strongly expressed genes and predict strong
promoter signals within their upstream regions. The framework was
applied to a diverse collection of RNA-seq data measured for the methanotroph Methylotuvimicrobium buryatense 5GB1 and identified 25 genes
that were constitutively, strongly expressed across 12 experimental
conditions. For each gene, the framework predicted short (27–30
nucleotide) sequences as candidate promoters and derived −35
and −10 consensus promoter motifs (TTGACA and TATAAT, respectively)
for strong expression in M. buryatense. This
consensus closely matches the canonical E. coli sigma-70 motif and was found to be enriched in promoter regions
of the genome. A subset of promoter predictions was experimentally
validated in a XylE reporter assay, including the consensus promoter,
which showed high expression. The pmoC, pqqA, and ssrA promoter predictions were additionally
screened in an experiment that scrambled the −35 and −10
signal sequences, confirming that transcription initiation was disrupted
when these specific regions of the predicted sequence were altered.
These results indicate that the computational framework can make biologically
meaningful promoter predictions and identify key pieces of regulatory
systems that can serve as foundational tools for engineering diverse
microorganisms for biomolecule production.
将微生物工程化改造为可将可再生原料转化为高价值物质的生物工厂,是合成生物学的主要目标之一;然而,针对诸多非模式生物,目前仍缺乏如全套强启动子这类必要的遗传工具,以实现高效的工程化改造。本研究中,我们开发了一套计算框架,可利用标准转录组测序(RNA-seq)数据集识别组成型高表达基因集,并在其上游区域预测强启动子信号。我们将该框架应用于针对甲烷氧化菌(methanotroph)Methylotuvimicrobium buryatense 5GB1的多组转录组测序数据集,最终筛选出在12种实验条件下均呈组成型高表达的25个基因。针对每个基因,该框架预测了长度为27~30个核苷酸的短序列作为候选启动子,并推导得到了可使M. buryatense实现高效表达的-35与-10区保守启动子基序(分别为TTGACA与TATAAT)。该保守基序与经典的大肠杆菌(E. coli)σ⁷⁰启动子基序高度相似,且经发现可在该菌基因组的启动子区域中富集。我们通过XylE报告基因检测实验对部分启动子预测结果进行了实验验证,其中包括该保守启动子,其表达量较高。我们还通过将-35与-10区信号序列进行随机突变的实验,对pmoC、pqqA以及ssrA的启动子预测结果进行了筛选,结果证实当预测序列的这些特定区域被改变后,转录起始过程会受到阻断。本研究结果表明,该计算框架可获得具有生物学意义的启动子预测结果,并可挖掘关键调控系统元件,能够作为工程化改造多样微生物以生产生物分子的基础工具。
创建时间:
2021-05-14



