five

High-Throughput Non-targeted Chemical Structure Identification Using Gas-Phase Infrared Spectra

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/High-Throughput_Non-targeted_Chemical_Structure_Identification_Using_Gas-Phase_Infrared_Spectra/15032080
下载链接
链接失效反馈
官方服务:
资源简介:
The high-throughput identification of unknown metabolites in biological samples remains challenging. Most current non-targeted metabolomics studies rely on mass spectrometry, followed by computational methods that rank thousands of candidate structures based on how closely their predicted mass spectra match the experimental mass spectrum of an unknown. We reasoned that the infrared (IR) spectra could be used in an analogous manner and could add orthologous structure discrimination; however, this has never been evaluated on large data sets. Here, we present results of a high-throughput computational method for predicting IR spectra of candidate compounds obtained from the PubChem database. Predicted spectra were ranked based on their similarity to gas-phase experimental IR spectra of test compounds obtained from the NIST. Our computational workflow (IRdentify) consists of a fast semiempirical quantum mechanical method for initial IR spectra prediction, ranking, and triaging, followed by a final IR spectra prediction and ranking using density functional theory. This approach resulted in the correct identification of 47% of 258 test compounds. On average, there were 2152 candidate structures evaluated for each test compound, giving a total of approximately 555,200 candidate structures evaluated. We discuss several variables that influenced the identification accuracy and then demonstrate the potential application of this approach in three areas: (1) combining IR and mass spectra rankings into a single composite rank score, (2) identifying the precursor and fragment ions using cryogenic ion vibrational spectroscopy, and (3) the incorporation of a trimethylsilyl derivatization step to extend the method compatibility to less-volatile compounds. Overall, our results suggest that matching computational with experimental IR spectra is a potentially powerful orthogonal option for adding significant high-throughput chemical structure discrimination when used with other non-targeted chemical structure identification methods.

生物样本中未知代谢物的高通量鉴定仍极具挑战性。当前多数非靶向代谢组学(non-targeted metabolomics)研究依赖质谱法,随后通过计算方法对数千种候选结构进行排序,排序依据为候选结构的预测质谱与未知物的实验质谱的匹配程度。我们推测,红外(IR)光谱可通过类似思路发挥作用,并可提供正交的结构区分能力,但此前从未在大规模数据集上对该方案进行评估。本研究提出了一种高通量计算方法,用于预测从PubChem数据库获取的候选化合物的红外光谱。预测光谱将根据其与从美国国家标准与技术研究院(NIST)获取的待测化合物的气相实验红外光谱的相似度进行排序。我们的计算流程(IRdentify)包含两步:首先通过快速半经验量子力学方法完成初始红外光谱预测、排序与筛选,随后采用密度泛函理论(DFT)进行最终的红外光谱预测与排序。该方法可从258种待测化合物中正确鉴定出47%的样本。平均每个待测化合物需评估2152种候选结构,总评估候选结构数量约为555200种。我们讨论了影响鉴定准确率的若干变量,并展示了该方法在三个领域的潜在应用:(1)将红外光谱与质谱的排序结果整合为单一综合评分;(2)通过低温离子振动光谱识别前体离子与碎片离子;(3)引入三甲基硅烷衍生化步骤,使该方法可兼容低挥发性化合物。总体而言,我们的研究结果表明,将计算红外光谱与实验红外光谱进行匹配,可作为一种极具潜力的正交手段,在与其他非靶向化学结构鉴定方法联用时,可显著提升高通量化学结构区分能力。
创建时间:
2021-07-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作