HSQCid: A Powerful Tool for Paving the Way to High-Throughput Structural Dereplication of Natural Products Based on Fast NMR Experiments
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/HSQCid_A_Powerful_Tool_for_Paving_the_Way_to_High-Throughput_Structural_Dereplication_of_Natural_Products_Based_on_Fast_NMR_Experiments/28352113
下载链接
链接失效反馈官方服务:
资源简介:
Structural dereplication is an essential
step in the
study of natural
products (NPs). The number of discovered NPs is so large that efficient
dereplication is highly desirable. NMR spectroscopy is still the gold
standard for structural identification. 13C NMR spectrum
is an effective molecular fingerprint, but their acquisition is time-consuming,
especially for mass-limited NPs. Several alternative methods or tools
have been proposed but have never reached general use for some reasons.
Here, a new artificial intelligence tool, HSQCid, using contrastive
learning between 1H–13C HSQC spectra
and structures, is proposed for effective structural identification.
Two structure encoders are compared, and the graph neural network
is preferred over the Transformer. In this way, 80% and 20% of about
400,000 predicted data could be used for training and testing, respectively.
Besides, with 17,971 experimental data as external test data, the
top-1 and top-10 accuracies reach 74.5% and 94.8%, respectively. Top-1
accuracy increases by at least 12% when combined with other easily
obtainable structure features, such as the total number of hydrogens
connected to carbons from 1H NMR spectra. Further data
analysis shows that the filters by structure features nearly eliminate
the influence (>10%) of the difference between predicted and experimental
data. Surprisingly, the influence of the number or the ratio of nonprotonated
carbons on the identification accuracy is only significant in specific
and rare cases (2.65%). Furthermore, the benchmark method, which matches 13C peaks, is compared and is markedly inferior to the proposed
method, with or without structural features. The HSQCid code is available
online. It is believed that HSQCid contributes to paving the way for
high-throughput or highly effective structural dereplication of NPs.
创建时间:
2025-02-05



