Learning the sequence code for mRNA and protein abundance in human immune cells

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://www.ncbi.nlm.nih.gov/sra/SRP455263

下载链接

链接失效反馈

官方服务：

资源简介：

mRNA and protein abundance are defined by transcriptional and post-transcriptional regulatory mechanisms. Here, we develop a machine learning pipeline, termed SONAR, to decipher the endogenous sequence code that determines mRNA and protein abundance in human cells. SONAR models predict up to 62% of mRNA and 63% of protein abundance independent of promoter or enhancer information, and reveal a strongâyet dynamicâcell-type specific sequence code. We also find that the effect of sequence features is dependent on their location within the mRNA transcript. Using SONAR, we design synthetic 3'UTRs, with which protein expression levels can be manipulated and tailored to a specific cell-type. Beyond its fundamental findings, our work provides novel means to improve immunotherapies and biotechnology applications. Overall design: A parallel reporter assay was performed to test the effect of synthetic 3'UTR sequences on GFP protein expression. HeLa cells, HEK cells, CD8+ T cell and CD4+ T cells were transduced with a GFP-3'UTR retroviral library containing Â±500 distinct synthetic 3'UTR sequences. Transduced cells were subsequently sorted for GFPhi or GFPlo cells. gDNA was isolated from these populations, 3'UTR sequences were amplified and sequenced to asses abundances

mRNA与蛋白质丰度由转录及转录后调控机制共同决定。本研究开发了一款名为SONAR的机器学习流程，用于解析人类细胞中决定mRNA与蛋白质丰度的内源性序列编码。SONAR模型可在不依赖启动子或增强子信息的前提下，实现对高达62%的mRNA丰度与63%的蛋白质丰度的预测，并揭示了一套兼具强关联性与动态可调性的细胞类型特异性序列编码。研究同时发现，序列特征的调控效应取决于其在mRNA转录本中的具体位置。借助SONAR平台，我们设计了合成型3'非翻译区（3'UTR），可精准调控蛋白质表达水平，并将其定制至特定细胞类型。除上述基础研究发现外，本研究还为免疫治疗与生物技术应用提供了全新的优化策略。总体实验设计：本研究采用平行报告基因检测法，探究合成型3'UTR序列对绿色荧光蛋白（GFP）表达的影响。我们将包含约500种不同合成3'UTR序列的GFP-3'UTR逆转录病毒文库转导至HeLa细胞、HEK细胞、CD8+ T细胞及CD4+ T细胞中。随后对转导细胞按GFP高表达（GFPhi）与GFP低表达（GFPlo）进行分选。从上述分选群体中提取基因组DNA（gDNA），扩增3'UTR序列并进行测序，以评估其丰度水平。

创建时间：

2023-09-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集