Exhaustively Identifying Cross-Linked Peptides with a Linear Computational Complexity

Name: Exhaustively Identifying Cross-Linked Peptides with a Linear Computational Complexity
Creator: ACS Publications
Published: 2023-06-01 00:00:00
License: 暂无描述

acs.figshare.com2023-06-01 更新2025-03-22 收录

下载链接：

https://acs.figshare.com/articles/dataset/Exhaustively_Identifying_Cross-Linked_Peptides_with_a_Linear_Computational_Complexity/5367301/1

下载链接

链接失效反馈

官方服务：

资源简介：

Chemical cross-linking coupled to mass spectrometry is a powerful tool to study protein–protein interactions and protein conformations. Two linked peptides are ionized and fragmented to produce a tandem mass spectrum. In such an experiment, a tandem mass spectrum contains ions from two peptides. The peptide identification problem becomes a peptide–peptide pair identification problem. Currently, most tools do not search all possible pairs due to the quadratic time complexity. Consequently, missed findings are unavoidable. In our previous work, we developed a tool named ECL to search all pairs of peptides exhaustively. Unfortunately, it is very slow due to the quadratic computational complexity, especially when the database is large. Furthermore, ECL uses a score function without statistical calibration, while researchers− have proposed that it is inappropriate to directly compare uncalibrated scores because different spectra have different random score distributions. Here we propose an advanced version of ECL, named ECL2. It achieves a linear time and space complexity by taking advantage of the additive property of a score function. It can search a data set containing tens of thousands of spectra against a database containing thousands of proteins in a few hours. Comparison with other five state-of-the-art tools shows that ECL2 is much faster than pLink, StavroX, ProteinProspector, and ECL. Kojak is the only one that is faster than ECL2, but Kojak does not exhaustively search all possible peptide pairs. The comparison shows that ECL2 has the highest sensitivity among the state-of-the-art tools. The experiment using a large-scale in vivo cross-linking data set demonstrates that ECL2 is the only tool that can find the peptide-spectrum matches (PSMs) passing the false discovery rate/q-value threshold. The result illustrates that the exhaustive search and a well-calibrated score function are useful to find PSMs from a huge search space.

化学交联结合质谱分析是一项强有力的技术，用以探究蛋白质-蛋白质相互作用及其构象。在此实验中，两个连接的肽段被电离并裂解，以生成串联质谱。串联质谱中包含了来自两个肽段的离子。因此，肽段鉴定问题转变为肽-肽对鉴定问题。目前，由于二次时间复杂度，大多数工具无法搜索所有可能的配对，从而导致漏检不可避免。在先前的研究中，我们开发了一种名为ECL的工具，以全面搜索所有肽段对。遗憾的是，由于二次计算复杂度，ECL的运行速度非常缓慢，尤其是在数据库规模较大时。此外，ECL采用未经过统计校准的评分函数，而研究者们提出，直接比较未经校准的评分是不恰当的，因为不同的谱图具有不同的随机评分分布。在本研究中，我们提出了一种ECL的先进版本，命名为ECL2。它通过利用评分函数的加性特性，实现了线性时间与空间复杂度。ECL2可以在几小时内搜索包含数万张谱图的数据集，并针对包含数千个蛋白质的数据库进行搜索。与其他五种最先进的工具相比，ECL2的速度远超pLink、StavroX、ProteinProspector和ECL。Kojak是唯一一个速度超过ECL2的工具，但Kojak并未全面搜索所有可能的肽段对。比较结果表明，ECL2在同类工具中具有最高的灵敏度。使用大规模体内交联数据集的实验表明，ECL2是唯一能够找到通过假发现率/统计量阈值的肽段-谱图匹配（PSMs）的工具。该结果说明，全面搜索和精确校准的评分函数对于从巨大搜索空间中找到PSMs是有益的。

提供机构：

ACS Publications

5,000+

优质数据集

54 个

任务类型

进入经典数据集