Modelling of ready biodegradability based on combined public and industrial data sources
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/record/3466618
下载链接
链接失效反馈官方服务:
资源简介:
The European REACH (Registration, Evaluation, Authorization and restriction of Chemicals) Regulation, requires marketed chemicals to be evaluated for Ready Biodegradability (RB). In-silico prediction is a valid alternative to expensive and time-consuming experimental testing. However, currently available models may not be relevant to predict compounds of industrial interest, due to accuracy and applicability domain restriction issues.
In this work we present a new and extended RB dataset (2830 compounds), issued by the merging of several public data sources. It was used to train classification models, which were externally validated and benchmarked against already-existing tools on a set of 316 compounds coming from the industrial context. New models showed good performances in terms of predictive power (BA = 0.74 – 0.79) and data coverage (83 – 91 %).
The Generative Topographic Mapping approach was employed to compare the chemical space of the various data sources: several chemotypes and structural motifs unique to the industrial dataset were identified, highlighting for which chemical classes currently available models may have less reliable predictions.
Finally, public and industrial data were merged into Global dataset containing 3146 compounds and including a significant subset of compounds coming from the industrial context. This is the biggest dataset reported in the literature so far which covers some chemotypes absent in the public data. Thus, predictive model developed on the Global dataset has much larger applicability domain than related models built on publicly available data. The developed model is available for the user on the Laboratory of Chemoinformatics website.
This dataset is only the "All-Public" set, since the industrial compounds cannot be disclosed.
This update contains additional entries from [J. Chem. Inf. Model. 52 (2012), pp. 655–669] and [J. Chem. Inf. Model. 53 (2013), pp. 867–878]
创建时间:
2020-01-24



