Learning the sequence code for mRNA and protein abundance in human immune cells [SONAR_MPRA]

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://www.ncbi.nlm.nih.gov/sra/SRP569834

下载链接

链接失效反馈

官方服务：

资源简介：

mRNA and protein abundance are defined by transcriptional and post-transcriptional regulatory mechanisms. Here, we develop a machine learning pipeline, termed SONAR, to decipher the endogenous sequence code that determines mRNA and protein abundance in human cells. SONAR models predict up to 62% of mRNA and 63% of protein abundance independent of promoter or enhancer information, and reveal a strongâyet dynamicâcell-type specific sequence code. We also find that the effect of sequence features is dependent on their location within the mRNA transcript. Using SONAR, we design synthetic 3'UTRs, with which protein expression levels can be manipulated and tailored to a specific cell-type. Beyond its fundamental findings, our work provides novel means to improve immunotherapies and biotechnology applications. Overall design: A parallel reporter assay was performed to test the effect of synthetic 3'UTR sequences on GFP protein expression. HEK cells, CD8+ T cell and CD4+ T cells were transduced with a GFP-3'UTR retroviral library containing Â±500 distinct synthetic 3'UTR sequences. Transduced cells were subsequently sorted for GFPhi or GFPlo cells. gDNA was isolated from these populations, 3'UTR sequences were amplified and sequenced to asses abundances

mRNA和蛋白质丰度由转录及转录后调控机制所决定。在此，我们开发了一款名为SONAR的机器学习分析流程，以解析决定人类细胞中mRNA与蛋白质丰度的内源序列编码。SONAR模型可在不依赖启动子或增强子信息的前提下，实现最高62%的mRNA丰度与63%的蛋白质丰度预测，并揭示了一套强大却动态可变的细胞类型特异性序列编码。我们还发现，序列特征的调控效应取决于其在mRNA转录本中的位置。借助SONAR，我们设计了合成型3'非翻译区（3'UTR），借此可对蛋白质表达水平进行精准调控，并将其定制至特定细胞类型所需的表达水平。除基础研究发现之外，本研究还为优化免疫治疗与生物技术应用提供了全新的途径。整体实验设计：我们开展了平行报告基因检测实验，以探究合成型3'UTR序列对绿色荧光蛋白（GFP）蛋白质表达的调控效应。我们将携带约500种不同合成型3'UTR序列的GFP-3'UTR逆转录病毒文库转导至HEK细胞、CD8+ T细胞与CD4+ T细胞中。随后，我们将转导后的细胞按GFP高表达（GFPhi）与GFP低表达（GFPlo）特征进行分选。从这些分选群体中分离基因组DNA（gDNA），扩增并测序3'UTR序列以评估其丰度。

创建时间：

2025-05-15