five

Proteoform-predictor: Increasing the Phylogenetic Reach of Top-Down Proteomics

收藏
Figshare2025-03-10 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Proteoform-predictor_Increasing_the_Phylogenetic_Reach_of_Top-Down_Proteomics/28566162
下载链接
链接失效反馈
官方服务:
资源简介:
Proteoforms are distinct molecular forms of proteins that act as building blocks of organisms, with post-translational modifications (PTMs) being one of the key changes that generate these variations. Mass spectrometry (MS)-based top-down proteomics (TDP) is the leading technology for proteoform identification due to its preservation of intact proteoforms for analysis, making it well-suited for comprehensive PTM characterization. A crucial step in TDP is searching MS data against a database of candidate proteoforms. To extend the reach of TDP to organisms with limited PTM annotations, we developed Proteoform-predictor, an open-source tool that integrates homology-based PTM site prediction into proteoform database creation. The new tool creates databases of proteoform candidates after registration of homologous sequences, transferring PTM sites from well-characterized species to those with less comprehensive proteomic data. Our tool features a user-friendly interface and intuitive workflow, making it accessible to a wide range of researchers. We demonstrate that Proteoform-predictor expands proteoform databases with tens of thousands of proteoforms for three bacterial strains by comparing them to the reference proteome of Escherichia coli (E. coli) K12. Subsequent TDP analysis for Serratia marcescens (S. marcescens) and Salmonella typhimurium (S. typhimurium) demonstrated significant improvement in protein and proteoform identification, even for proteins with variant sequences. As TDP technology advances, Proteoform-predictor will become an important tool for expanding the applicability of proteoform identification and PTM biology to more diverse species across the phylogenetic tree of life.

蛋白质变体(Proteoforms)是构成生物体的蛋白质的不同分子形态,翻译后修饰(post-translational modifications, PTMs)是催生这类差异的核心变化之一。基于质谱(Mass spectrometry, MS)的自上而下蛋白质组学(top-down proteomics, TDP)是当前蛋白质变体鉴定的主流技术,因其可保留完整的蛋白质变体用于分析,故而非常适合全面表征翻译后修饰。自上而下蛋白质组学的核心步骤之一,是将质谱数据与候选蛋白质变体数据库进行比对检索。为了将自上而下蛋白质组学的应用范围拓展至翻译后修饰注释有限的生物体,我们开发了Proteoform-predictor这一开源工具,它将基于同源性的翻译后修饰位点预测整合至蛋白质变体数据库构建流程中。该工具通过同源序列注册流程生成候选蛋白质变体数据库,将已被充分研究的物种的翻译后修饰位点迁移至蛋白质组数据较为匮乏的物种中。本工具具备友好的用户界面与直观的操作流程,可被广泛的研究人员便捷使用。我们通过将三种细菌菌株的候选数据库与大肠杆菌(Escherichia coli, E. coli)K12参考蛋白质组进行比对,验证了Proteoform-predictor可为这三种菌株生成数以万计的新增蛋白质变体,极大拓展了其蛋白质变体数据库规模。后续针对黏质沙雷氏菌(Serratia marcescens, S. marcescens)与鼠伤寒沙门氏菌(Salmonella typhimurium, S. typhimurium)的自上而下蛋白质组学分析结果显示,即便对于携带变异序列的蛋白质,该工具也能显著提升蛋白质与蛋白质变体的鉴定效率。随着自上而下蛋白质组学技术的不断发展,Proteoform-predictor将成为推动蛋白质变体鉴定与翻译后修饰生物学研究,覆盖生命演化树中更多多样化物种的重要工具。
创建时间:
2025-03-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作