five

Exhaustively Identifying Cross-Linked Peptides with a Linear Computational Complexity

收藏
acs.figshare.com2023-06-05 更新2025-03-22 收录
下载链接:
https://acs.figshare.com/articles/dataset/Exhaustively_Identifying_Cross-Linked_Peptides_with_a_Linear_Computational_Complexity/5367307/1
下载链接
链接失效反馈
官方服务:
资源简介:
Chemical cross-linking coupled to mass spectrometry is a powerful tool to study protein–protein interactions and protein conformations. Two linked peptides are ionized and fragmented to produce a tandem mass spectrum. In such an experiment, a tandem mass spectrum contains ions from two peptides. The peptide identification problem becomes a peptide–peptide pair identification problem. Currently, most tools do not search all possible pairs due to the quadratic time complexity. Consequently, missed findings are unavoidable. In our previous work, we developed a tool named ECL to search all pairs of peptides exhaustively. Unfortunately, it is very slow due to the quadratic computational complexity, especially when the database is large. Furthermore, ECL uses a score function without statistical calibration, while researchers− have proposed that it is inappropriate to directly compare uncalibrated scores because different spectra have different random score distributions. Here we propose an advanced version of ECL, named ECL2. It achieves a linear time and space complexity by taking advantage of the additive property of a score function. It can search a data set containing tens of thousands of spectra against a database containing thousands of proteins in a few hours. Comparison with other five state-of-the-art tools shows that ECL2 is much faster than pLink, StavroX, ProteinProspector, and ECL. Kojak is the only one that is faster than ECL2, but Kojak does not exhaustively search all possible peptide pairs. The comparison shows that ECL2 has the highest sensitivity among the state-of-the-art tools. The experiment using a large-scale in vivo cross-linking data set demonstrates that ECL2 is the only tool that can find the peptide-spectrum matches (PSMs) passing the false discovery rate/q-value threshold. The result illustrates that the exhaustive search and a well-calibrated score function are useful to find PSMs from a huge search space.

化学交联结合质谱分析是一种强大的研究蛋白质-蛋白质相互作用及蛋白质构象的工具。两个连接的肽段被电离并裂解,以产生串联质谱。在此类实验中,串联质谱包含来自两个肽段的离子。因此,肽段识别问题转变为肽-肽对识别问题。目前,由于二次时间复杂度,大多数工具无法搜索所有可能的配对,从而不可避免地造成漏检。在我们之前的研究中,我们开发了一种名为ECL的工具,用于彻底搜索所有肽段配对。遗憾的是,由于其二次计算复杂度,尤其是当数据库较大时,ECL的速度非常慢。此外,ECL使用未进行统计校准的得分函数,而研究人员们提出,由于不同光谱具有不同的随机得分分布,直接比较未经校准的得分是不恰当的。在此,我们提出ECL的高级版本,命名为ECL2。通过利用得分函数的加性特性,ECL2实现了线性时间和空间复杂度,能够在数小时内搜索包含数万种光谱的数据集,与包含数千个蛋白质的数据库进行匹配。与其他五种最先进的工具相比,ECL2的速度远超pLink、StavroX、ProteinProspector和ECL。唯一比ECL2速度快的是Kojak,但Kojak并未彻底搜索所有可能的肽段配对。比较结果显示,ECL2在所有最先进工具中具有最高的灵敏度。使用大规模体内交联数据集的实验表明,ECL2是唯一能够找到通过假发现率/ q值阈值的肽-谱匹配(PSMs)的工具。这一结果说明,彻底的搜索和良好的得分函数校准对于从巨大的搜索空间中找到PSMs是有益的。
提供机构:
ACS Publications
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作