ACL_datasets
收藏DataCite Commons2025-06-01 更新2025-05-07 收录
下载链接:
https://figshare.com/articles/dataset/ACL_datasets/28588703/1
下载链接
链接失效反馈官方服务:
资源简介:
The chemical space of substances covers about 10<sup>60</sup> known and unknown compound structures, including concerning chemicals occurring in the environmental exposome. Non-target analysis (NTA) using liquid chromatography (LC) with high-resolution mass spectrometry (HRMS) is the key methodology to comprehensively detect such high chemical variability in complex biological and environmental samples. Unfortunately, the tentative and known structures identified by NTA LC-HRMS cover less than 1% of the vast chemical space available. The complex data acquisition and analysis of NTA workflows contribute to this limitation. Although implementing a reliable LC-HRMS acquisition is essential for ensuring the quality and quantity of detectable structures, methods are often developed and validated on groups of target chemicals of interest (e.g., pharmaceuticals and/or personal care products) with limited structural and physicochemical variability. This bias reduces the detectable chemical space in chromatography and MS domains. However, prior knowledge on the chemical space region amenable to the developed NTA LC–HRMS methods is crucial for compound detection and identification confidence. To expand the detectable chemical space in NTA, we present an innovative workflow focused on unbiased sampling of compound structures for LC–HRMS method development from a vast chemical space of interest, such as the USEPA CompTox Chemistry Dashboard (>1 million chemicals). This workflow utilizes multivariate, machine learning classification, and prediction models to mine structural candidates maximizing chemical space coverage. Accordingly, amenable compound lists (ACLs) are effectively sampled based on major PubChem physicochemical variables (e.g., molecular weight and XLogP) and predicted environmental mobility and ionization efficiency (IE) using structural fingerprint models, ensuring the selection of heterogeneous structures compatible with LC–HRMS analysis. The unbiased sampling of the CompTox chemical space generated ACLs containing more than ten thousand candidate structures compatible with LC–HRMS analysis in both positive and negative ionization modes (logIE > 3.5 and <1.5, respectively). ACL subsets (n=300) exhibited a significant chemical space coverage in terms of mass range (up to 1200 Da), predicted retention index (from 5 to 900), and structural variability including chemical classes in common with the European “watch lists” on the water monitoring framework. As a result, the proposed unbiased selection of ACLs from complex chemical spaces (exposomics, metabolomics, etc.) can potentially enhance NTA LC–HRMS detection coverage, as well as can assist the implementation of analysis methods consistent with the continuous expansion of the chemical space.
提供机构:
figshare
创建时间:
2025-03-13



