Exhaustively Identifying Cross-Linked Peptides with a Linear Computational Complexity

NIAID Data Ecosystem2026-03-10 收录

下载链接：

https://figshare.com/articles/dataset/Exhaustively_Identifying_Cross-Linked_Peptides_with_a_Linear_Computational_Complexity/5367307

下载链接

链接失效反馈

官方服务：

资源简介：

Chemical cross-linking coupled to mass spectrometry is a powerful tool to study protein–protein interactions and protein conformations. Two linked peptides are ionized and fragmented to produce a tandem mass spectrum. In such an experiment, a tandem mass spectrum contains ions from two peptides. The peptide identification problem becomes a peptide–peptide pair identification problem. Currently, most tools do not search all possible pairs due to the quadratic time complexity. Consequently, missed findings are unavoidable. In our previous work, we developed a tool named ECL to search all pairs of peptides exhaustively. Unfortunately, it is very slow due to the quadratic computational complexity, especially when the database is large. Furthermore, ECL uses a score function without statistical calibration, while researchers− have proposed that it is inappropriate to directly compare uncalibrated scores because different spectra have different random score distributions. Here we propose an advanced version of ECL, named ECL2. It achieves a linear time and space complexity by taking advantage of the additive property of a score function. It can search a data set containing tens of thousands of spectra against a database containing thousands of proteins in a few hours. Comparison with other five state-of-the-art tools shows that ECL2 is much faster than pLink, StavroX, ProteinProspector, and ECL. Kojak is the only one that is faster than ECL2, but Kojak does not exhaustively search all possible peptide pairs. The comparison shows that ECL2 has the highest sensitivity among the state-of-the-art tools. The experiment using a large-scale in vivo cross-linking data set demonstrates that ECL2 is the only tool that can find the peptide-spectrum matches (PSMs) passing the false discovery rate/q-value threshold. The result illustrates that the exhaustive search and a well-calibrated score function are useful to find PSMs from a huge search space.

化学交联结合质谱（Chemical cross-linking coupled to mass spectrometry）是研究蛋白质-蛋白质相互作用与蛋白质构象的高效工具。两条交联肽段经电离后碎裂，可产生串联质谱。此类实验中的串联质谱会包含来自两条肽段的离子，因此肽段鉴定问题转化为肽段对鉴定问题。当前，受二次时间复杂度限制，多数工具无法遍历所有可能的肽段对组合，因而不可避免地存在漏检情况。在前期研究中，我们开发了一款名为ECL的工具，可穷尽式搜索所有肽段对组合。但该工具受二次计算复杂度影响，运行速度极慢，尤其在数据库规模较大时更为显著。此外，ECL采用未经过统计校准的评分函数——已有研究表明，由于不同质谱的随机评分分布存在差异，直接比较未校准的评分并不恰当。本文提出ECL的升级版本ECL2。该工具借助评分函数的可加性特性，将时间与空间复杂度降至线性级别，可在数小时内完成针对包含数万个质谱的数据集、搭配含数千条蛋白质的数据库的搜索任务。与五款当前顶尖工具的对比结果显示，ECL2的运行速度远优于pLink、StavroX、ProteinProspector与ECL。Kojak是唯一一款速度快于ECL2的工具，但它无法穷尽搜索所有可能的肽段对组合。对比实验表明，ECL2在当前顶尖工具中拥有最高的灵敏度。针对大规模体内交联数据集的实验结果证明，ECL2是唯一一款可筛选出符合错误发现率/q值阈值的肽段-谱匹配（Peptide-spectrum matches，PSMs）的工具。该结果证实，穷尽式搜索与经过良好校准的评分函数，对于从超大规模搜索空间中筛选肽段-谱匹配具有重要应用价值。

创建时间：

2017-08-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集