Flagging False Positives Following Untargeted LC–MS Characterization of Histone Post-Translational Modification Combinations

NIAID Data Ecosystem2026-03-09 收录

下载链接：

https://figshare.com/articles/dataset/Flagging_False_Positives_Following_Untargeted_LC_MS_Characterization_of_Histone_Post-Translational_Modification_Combinations/4291298

下载链接

链接失效反馈

官方服务：

资源简介：

Epigenetic changes can be studied with an untargeted characterization of histone post-translational modifications (PTMs) by liquid chromatography–mass spectrometry (LC–MS). While prior information about more than 20 types of histone PTMs exists, little is known about histone PTM combinations (PTMCs). Because of the combinatorial explosion it is intrinsically impossible to consider all potential PTMCs in a database search. Consequentially, high-scoring false positives with unconsidered but correct alternative isobaric PTMCs can occur. Current quality controls can neither estimate the amount of unconsidered alternatives nor flag potential false positives. Here, we propose a conceptual workflow that provides such options. In this workflow, an in silico modeling of all candidate isoforms with known-to-exist PTMs is made. The most frequently occurring PTM sets of these candidate isoforms are determined and used in several database searches. This suppresses the combinatorial explosion while considering as many candidate isoforms as possible. Finally, annotations can be classified as unique or ambiguous, the latter implying false positives. This workflow was evaluated on an LC–MS data set containing 44 histone extracts. We were able to consider 60% of all candidate isoforms. Importantly, 40% of all annotations were classified as ambiguous. This highlights the need for a more thorough evaluation of modified peptide annotations.

表观遗传修饰（epigenetic changes）可通过液相色谱-质谱联用技术（liquid chromatography–mass spectrometry, LC-MS）对组蛋白翻译后修饰（histone post-translational modifications, PTMs）开展非靶向表征研究。尽管目前已获取20余种组蛋白翻译后修饰的相关研究数据，但学界对组蛋白翻译后修饰组合（histone PTM combinations, PTMCs）的认知仍十分有限。受组合爆炸效应影响，数据库检索中本质上无法覆盖所有潜在的组蛋白翻译后修饰组合，由此可能产生高分假阳性结果：这类结果对应的未被纳入检索范畴的替代修饰组合，实则为正确的同量异位组蛋白翻译后修饰组合（isobaric PTMCs）。当前的质量控制手段既无法估算未被纳入考虑的替代修饰组合数量，也无法标记潜在的假阳性结果。为此，本研究提出了一种可解决上述问题的概念性分析流程。该流程首先针对所有携带已知存在的组蛋白翻译后修饰的候选异构体开展计算机模拟（in silico）建模，随后筛选出这些候选异构体中出现频率最高的翻译后修饰组合，并将其应用于多轮数据库检索。此举在尽可能覆盖更多候选异构体的同时，有效抑制了组合爆炸问题。最终，可将肽段修饰注释分为唯一匹配与模糊匹配两类，其中模糊匹配即对应潜在假阳性结果。本研究使用包含44份组蛋白提取物的LC-MS数据集对该流程进行了评估，最终成功覆盖了60%的全部候选异构体。值得注意的是，40%的注释结果被归类为模糊匹配，这一结果凸显了对修饰肽段注释开展更全面评估的必要性。

创建时间：

2016-12-07

5,000+

优质数据集

54 个

任务类型

进入经典数据集