Exhaustively Identifying Cross-Linked Peptides with a Linear Computational Complexity
收藏acs.figshare.com2023-06-01 更新2025-03-22 收录
下载链接:
https://acs.figshare.com/articles/dataset/Exhaustively_Identifying_Cross-Linked_Peptides_with_a_Linear_Computational_Complexity/5367301/1
下载链接
链接失效反馈官方服务:
资源简介:
Chemical
cross-linking coupled to mass spectrometry is a powerful
tool to study protein–protein interactions and protein conformations.
Two linked peptides are ionized and fragmented to produce a tandem
mass spectrum. In such an experiment, a tandem mass spectrum contains
ions from two peptides. The peptide identification problem becomes
a peptide–peptide pair identification problem. Currently, most
tools do not search all possible pairs due to the quadratic time complexity.
Consequently, missed findings are unavoidable. In our previous work,
we developed a tool named ECL to search all pairs of peptides exhaustively.
Unfortunately, it is very slow due to the quadratic computational
complexity, especially when the database is large. Furthermore, ECL
uses a score function without statistical calibration, while researchers− have proposed that it is inappropriate to directly compare uncalibrated
scores because different spectra have different random score distributions.
Here we propose an advanced version of ECL, named ECL2. It achieves
a linear time and space complexity by taking advantage of the additive
property of a score function. It can search a data set containing
tens of thousands of spectra against a database containing thousands
of proteins in a few hours. Comparison with other five state-of-the-art
tools shows that ECL2 is much faster than pLink, StavroX, ProteinProspector,
and ECL. Kojak is the only one that is faster than ECL2, but Kojak
does not exhaustively search all possible peptide pairs. The comparison
shows that ECL2 has the highest sensitivity among the state-of-the-art
tools. The experiment using a large-scale in vivo cross-linking data
set demonstrates that ECL2 is the only tool that can find the peptide-spectrum
matches (PSMs) passing the false discovery rate/q-value threshold. The result illustrates that the exhaustive search
and a well-calibrated score function are useful to find PSMs from
a huge search space.
化学交联结合质谱分析是一项强有力的技术,用以探究蛋白质-蛋白质相互作用及其构象。在此实验中,两个连接的肽段被电离并裂解,以生成串联质谱。串联质谱中包含了来自两个肽段的离子。因此,肽段鉴定问题转变为肽-肽对鉴定问题。目前,由于二次时间复杂度,大多数工具无法搜索所有可能的配对,从而导致漏检不可避免。在先前的研究中,我们开发了一种名为ECL的工具,以全面搜索所有肽段对。遗憾的是,由于二次计算复杂度,ECL的运行速度非常缓慢,尤其是在数据库规模较大时。此外,ECL采用未经过统计校准的评分函数,而研究者们提出,直接比较未经校准的评分是不恰当的,因为不同的谱图具有不同的随机评分分布。在本研究中,我们提出了一种ECL的先进版本,命名为ECL2。它通过利用评分函数的加性特性,实现了线性时间与空间复杂度。ECL2可以在几小时内搜索包含数万张谱图的数据集,并针对包含数千个蛋白质的数据库进行搜索。与其他五种最先进的工具相比,ECL2的速度远超pLink、StavroX、ProteinProspector和ECL。Kojak是唯一一个速度超过ECL2的工具,但Kojak并未全面搜索所有可能的肽段对。比较结果表明,ECL2在同类工具中具有最高的灵敏度。使用大规模体内交联数据集的实验表明,ECL2是唯一能够找到通过假发现率/统计量阈值的肽段-谱图匹配(PSMs)的工具。该结果说明,全面搜索和精确校准的评分函数对于从巨大搜索空间中找到PSMs是有益的。
提供机构:
ACS Publications



