Learning the sequence code for mRNA and protein abundance in human immune cells

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE240919

下载链接

链接失效反馈

官方服务：

资源简介：

mRNA and protein abundance are defined by transcriptional and post-transcriptional regulatory mechanisms. Here, we develop a machine learning pipeline, termed SONAR, to decipher the endogenous sequence code that determines mRNA and protein abundance in human cells. SONAR models predict up to 62% of mRNA and 63% of protein abundance independent of promoter or enhancer information, and reveal a strong—yet dynamic—cell-type specific sequence code. We also find that the effect of sequence features is dependent on their location within the mRNA transcript. Using SONAR, we design synthetic 3’UTRs, with which protein expression levels can be manipulated and tailored to a specific cell-type. Beyond its fundamental findings, our work provides novel means to improve immunotherapies and biotechnology applications. A parallel reporter assay was performed to test the effect of synthetic 3'UTR sequences on GFP protein expression. HeLa cells, HEK cells, CD8+ T cell and CD4+ T cells were transduced with a GFP-3'UTR retroviral library containing ±500 distinct synthetic 3'UTR sequences. Transduced cells were subsequently sorted for GFPhi or GFPlo cells. gDNA was isolated from these populations, 3'UTR sequences were amplified and sequenced to asses abundances

创建时间：

2023-09-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集