Deeply mining a universe of peptides Encoded by Long Noncoding RNAs

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://www.omicsdi.org/dataset/pride/PXD016981

下载链接

链接失效反馈

官方服务：

资源简介：

Long non-coding RNAs (lncRNAs) are generally defined as RNA transcripts longer than 200 nucleotides that are not translated into proteins. Recently, many small open reading frames (smORFs) embedded in lncRNA scripts have been verified to be able to encode functional polypeptides (namely lncRNA-SEPs here). Although collaborative analysis by advanced genomics, bioinformatics and proteomics largely drives SEPs discovery, the poor predictability, diminutive size and low abundance still challenge systematic identification of SEPs from different biological samples. Here, we took advantage of the NONCODE database that deposited with the most complete collection and annotation of lncRNA transcripts from different species to build a database that to maximally collect all putative small ORFs from human and mouse lncRNA transcripts. Two effective and complementary polypeptides enrichment strategies (30 kDa MWCO filter and C8 SPE column) were also integrated to further improve the discovery of novel lncRNA-SEPs. These efforts led to the discovery of 362 lncRNA-SEPs from 8 human cell lines and 238 lncRNA-SEPs from 3 mouse cell lines and 8 mouse tissues. 18 out of these lncRNA-SEPs were verified experimentally by multiple technologies including in vitro expression, immunoblotting and parallel reaction monitoring-based mass spectrometry (PRM-MS) in 293T cells. Further bioinformatic analysis reveals that the physical and chemical properties of these novel lncRNA-SEPs, such as amino acid composition and codon usage, are varied from canonical proteins. Intriguingly, nearly 70% of the identified lncRNA-SEPs were found to be initiated with non-AUG start codons. Collectively, the efficient workflows presented in this study enables us identify 600 novel lncRNA-SPEs from multiple cell lines and tissues, which should represent the largest number of MS-detected lncRNA-encoding SEPs ever reported to date. These novel lncRNA-SEPs not only could provide new clues for the annotation of the noncoding elements in the genome, but also could serve as a valuable resource for the functional characterization of individual lncRNA-SEPs.

长链非编码RNA（long non-coding RNAs, lncRNAs）通常被定义为长度超过200个核苷酸且不翻译为蛋白质的RNA转录本。近年来，诸多嵌入lncRNA转录序列中的小开放阅读框（small open reading frames, smORFs）已被证实可编码功能性多肽（即本文所指的lncRNA-SEPs，长链非编码RNA编码的小多肽）。尽管借助先进基因组学、生物信息学与蛋白质组学的联合分析极大推动了SEPs的发现，但预测性不佳、分子量极小以及丰度低下等问题，仍给从不同生物样本中系统性鉴定SEPs带来了挑战。本研究依托收录了不同物种lncRNA转录本最完整集合与注释信息的NONCODE数据库，构建了一个可最大程度收集人源与鼠源lncRNA转录本中所有推定小开放阅读框的数据库。同时整合了两种高效且互补的多肽富集策略——30 kDa分子量截留系数滤膜与C8固相萃取柱，以进一步提升新型lncRNA-SEPs的发现效率。通过上述工作，本研究从8种人源细胞系中鉴定出362个lncRNA-SEPs，从3种鼠源细胞系与8种鼠源组织中鉴定出238个lncRNA-SEPs。其中18个lncRNA-SEPs已通过多种实验技术在293T细胞中得到验证，包括体外表达、免疫印迹以及基于平行反应监测的质谱（parallel reaction monitoring-based mass spectrometry, PRM-MS）分析。进一步的生物信息学分析显示，这些新型lncRNA-SEPs的理化性质（如氨基酸组成与密码子使用偏好性）与经典蛋白质存在显著差异。有趣的是，近70%的已鉴定lncRNA-SEPs以非AUG起始密码子作为翻译起始位点。综上，本研究提出的高效实验流程可从多种细胞系与组织中鉴定出600个新型lncRNA-SEPs，这是迄今为止已报道的数量最多的经质谱检测的lncRNA编码SEPs。这些新型lncRNA-SEPs不仅可为基因组中非编码元件的注释提供新线索，也可为单个lncRNA-SEPs的功能表征提供宝贵的研究资源。

创建时间：

2023-05-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集