A Computational Framework for Identifying Promoter Sequences in Nonmodel Organisms Using RNA-seq Data Sets

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://figshare.com/articles/dataset/A_Computational_Framework_for_Identifying_Promoter_Sequences_in_Nonmodel_Organisms_Using_RNA-seq_Data_Sets/14599280

下载链接

链接失效反馈

官方服务：

资源简介：

Engineering microorganisms into biological factories that convert renewable feedstocks into valuable materials is a major goal of synthetic biology; however, for many nonmodel organisms, we do not yet have the genetic tools, such as suites of strong promoters, necessary to effectively engineer them. In this work, we developed a computational framework that can leverage standard RNA-seq data sets to identify sets of constitutive, strongly expressed genes and predict strong promoter signals within their upstream regions. The framework was applied to a diverse collection of RNA-seq data measured for the methanotroph Methylotuvimicrobium buryatense 5GB1 and identified 25 genes that were constitutively, strongly expressed across 12 experimental conditions. For each gene, the framework predicted short (27–30 nucleotide) sequences as candidate promoters and derived −35 and −10 consensus promoter motifs (TTGACA and TATAAT, respectively) for strong expression in M. buryatense. This consensus closely matches the canonical E. coli sigma-70 motif and was found to be enriched in promoter regions of the genome. A subset of promoter predictions was experimentally validated in a XylE reporter assay, including the consensus promoter, which showed high expression. The pmoC, pqqA, and ssrA promoter predictions were additionally screened in an experiment that scrambled the −35 and −10 signal sequences, confirming that transcription initiation was disrupted when these specific regions of the predicted sequence were altered. These results indicate that the computational framework can make biologically meaningful promoter predictions and identify key pieces of regulatory systems that can serve as foundational tools for engineering diverse microorganisms for biomolecule production.

将微生物工程化改造为可将可再生原料转化为高价值物质的生物工厂，是合成生物学的主要目标之一；然而，针对诸多非模式生物，目前仍缺乏如全套强启动子这类必要的遗传工具，以实现高效的工程化改造。本研究中，我们开发了一套计算框架，可利用标准转录组测序（RNA-seq）数据集识别组成型高表达基因集，并在其上游区域预测强启动子信号。我们将该框架应用于针对甲烷氧化菌（methanotroph）Methylotuvimicrobium buryatense 5GB1的多组转录组测序数据集，最终筛选出在12种实验条件下均呈组成型高表达的25个基因。针对每个基因，该框架预测了长度为27~30个核苷酸的短序列作为候选启动子，并推导得到了可使M. buryatense实现高效表达的-35与-10区保守启动子基序（分别为TTGACA与TATAAT）。该保守基序与经典的大肠杆菌（E. coli）σ⁷⁰启动子基序高度相似，且经发现可在该菌基因组的启动子区域中富集。我们通过XylE报告基因检测实验对部分启动子预测结果进行了实验验证，其中包括该保守启动子，其表达量较高。我们还通过将-35与-10区信号序列进行随机突变的实验，对pmoC、pqqA以及ssrA的启动子预测结果进行了筛选，结果证实当预测序列的这些特定区域被改变后，转录起始过程会受到阻断。本研究结果表明，该计算框架可获得具有生物学意义的启动子预测结果，并可挖掘关键调控系统元件，能够作为工程化改造多样微生物以生产生物分子的基础工具。

创建时间：

2021-05-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集