Quick and clean: Cracking E.coli’s code by LC-MS/MS, de novo sequencing, and dictionary search

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://www.omicsdi.org/dataset/pride/PXD012015

下载链接

链接失效反馈

官方服务：

资源简介：

In this study, we faced the challenge of deciphering a protein that has been designed and expressed by E.coli in such a way that the amino acid sequence encodes two concatenated English sentences. The sequence carried unknown modifications and cannot be found online. The letters ‘O’ and ‘U’ are both replaced by ‘K’ in the protein. To solve the challenge, we developped a workflow consisting of shotgun proteomics, de novo sequencing and a bioinformatic tool to search for words from the identified sequences. By using this workflow, we assembled the first complete English sentence and validated by searching against a customized sequence database.

本研究中，我们面临一项蛋白质破译挑战：该蛋白质由大肠杆菌（Escherichia coli，简称E.coli）设计并表达，其氨基酸序列可编码两条串联的英文句子。该氨基酸序列带有未知修饰，且未在公开网络资源中检索到。在该蛋白质中，英文字母‘O’与‘U’均被替换为‘K’。为攻克该研究难题，我们开发了一套包含鸟枪法蛋白质组学（shotgun proteomics）、从头测序（de novo sequencing）及生物信息学工具的分析工作流程，用于从已鉴定的序列中检索英文单词。借助该工作流程，我们成功组装得到首个完整的英文句子，并通过比对定制化序列数据库完成了验证。

创建时间：

2020-05-26