Transparency in Modeling through Careful Application of OECD’s QSAR/QSPR Principles via a Curated Water Solubility Data Set
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://figshare.com/articles/dataset/Transparency_in_Modeling_through_Careful_Application_of_OECD_s_QSAR_QSPR_Principles_via_a_Curated_Water_Solubility_Data_Set/22222742
下载链接
链接失效反馈官方服务:
资源简介:
The need for careful assembly, training, and validation
of quantitative
structure–activity/property models (QSAR/QSPR) is more significant
than ever as data sets become larger and sophisticated machine learning
tools become increasingly ubiquitous and accessible to the scientific
community. Regulatory agencies such as the United States Environmental
Protection Agency must carefully scrutinize each aspect of a resulting
QSAR/QSPR model to determine its potential use in environmental exposure
and hazard assessment. Herein, we revisit the goals of the Organisation
for Economic Cooperation and Development (OECD) in our application
and discuss the validation principles for structure–activity
models. We apply these principles to a model for predicting water
solubility of organic compounds derived using random forest regression,
a common machine learning approach in the QSA/PR literature. Using
public sources, we carefully assembled and curated a data set consisting
of 10,200 unique chemical structures with associated water solubility
measurements. This data set was then used as a focal narrative to
methodically consider the OECD’s QSA/PR principles and how
they can be applied to random forests. Despite some expert, mechanistically
informed supervision of descriptor selection to enhance model interpretability,
we achieved a model of water solubility with comparable performance
to previously published models (5-fold cross validated performance
0.81 R2 and 0.98 RMSE). We hope this work
will catalyze a necessary conversation around the importance of cautiously
modernizing and explicitly leveraging OECD principles while pursuing
state-of-the-art machine learning approaches to derive QSA/PR models
suitable for regulatory consideration.
创建时间:
2023-03-06



