five

HSQCid: A Powerful Tool for Paving the Way to High-Throughput Structural Dereplication of Natural Products Based on Fast NMR Experiments

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/HSQCid_A_Powerful_Tool_for_Paving_the_Way_to_High-Throughput_Structural_Dereplication_of_Natural_Products_Based_on_Fast_NMR_Experiments/28352113
下载链接
链接失效反馈
官方服务:
资源简介:
Structural dereplication is an essential step in the study of natural products (NPs). The number of discovered NPs is so large that efficient dereplication is highly desirable. NMR spectroscopy is still the gold standard for structural identification. 13C NMR spectrum is an effective molecular fingerprint, but their acquisition is time-consuming, especially for mass-limited NPs. Several alternative methods or tools have been proposed but have never reached general use for some reasons. Here, a new artificial intelligence tool, HSQCid, using contrastive learning between 1H–13C HSQC spectra and structures, is proposed for effective structural identification. Two structure encoders are compared, and the graph neural network is preferred over the Transformer. In this way, 80% and 20% of about 400,000 predicted data could be used for training and testing, respectively. Besides, with 17,971 experimental data as external test data, the top-1 and top-10 accuracies reach 74.5% and 94.8%, respectively. Top-1 accuracy increases by at least 12% when combined with other easily obtainable structure features, such as the total number of hydrogens connected to carbons from 1H NMR spectra. Further data analysis shows that the filters by structure features nearly eliminate the influence (>10%) of the difference between predicted and experimental data. Surprisingly, the influence of the number or the ratio of nonprotonated carbons on the identification accuracy is only significant in specific and rare cases (2.65%). Furthermore, the benchmark method, which matches 13C peaks, is compared and is markedly inferior to the proposed method, with or without structural features. The HSQCid code is available online. It is believed that HSQCid contributes to paving the way for high-throughput or highly effective structural dereplication of NPs.
创建时间:
2025-02-05
二维码
社区交流群
二维码
科研交流群
商业服务