ACL_datasets
收藏Figshare2025-03-13 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/ACL_datasets/28588703
下载链接
链接失效反馈官方服务:
资源简介:
The chemical space of substances covers about 1060 known and unknown compound structures, including concerning chemicals occurring in the environmental exposome. Non-target analysis (NTA) using liquid chromatography (LC) with high-resolution mass spectrometry (HRMS) is the key methodology to comprehensively detect such high chemical variability in complex biological and environmental samples. Unfortunately, the tentative and known structures identified by NTA LC-HRMS cover less than 1% of the vast chemical space available. The complex data acquisition and analysis of NTA workflows contribute to this limitation. Although implementing a reliable LC-HRMS acquisition is essential for ensuring the quality and quantity of detectable structures, methods are often developed and validated on groups of target chemicals of interest (e.g., pharmaceuticals and/or personal care products) with limited structural and physicochemical variability. This bias reduces the detectable chemical space in chromatography and MS domains. However, prior knowledge on the chemical space region amenable to the developed NTA LC–HRMS methods is crucial for compound detection and identification confidence. To expand the detectable chemical space in NTA, we present an innovative workflow focused on unbiased sampling of compound structures for LC–HRMS method development from a vast chemical space of interest, such as the USEPA CompTox Chemistry Dashboard (>1 million chemicals). This workflow utilizes multivariate, machine learning classification, and prediction models to mine structural candidates maximizing chemical space coverage. Accordingly, amenable compound lists (ACLs) are effectively sampled based on major PubChem physicochemical variables (e.g., molecular weight and XLogP) and predicted environmental mobility and ionization efficiency (IE) using structural fingerprint models, ensuring the selection of heterogeneous structures compatible with LC–HRMS analysis. The unbiased sampling of the CompTox chemical space generated ACLs containing more than ten thousand candidate structures compatible with LC–HRMS analysis in both positive and negative ionization modes (logIE > 3.5 and
创建时间:
2025-03-13



