Critically Assessing the Predictive Power of QSAR Models for Human Liver Microsomal Stability
收藏NIAID Data Ecosystem2026-03-08 收录
下载链接:
https://figshare.com/articles/dataset/Critically_Assessing_the_Predictive_Power_of_QSAR_Models_for_Human_Liver_Microsomal_Stability/2138476
下载链接
链接失效反馈官方服务:
资源简介:
To
lower the possibility of late-stage failures in the drug development
process, an up-front assessment of absorption, distribution, metabolism,
elimination, and toxicity is commonly implemented through a battery
of in silico and in vitro assays.
As in vitro data is accumulated, in silico quantitative structure–activity relationship (QSAR) models
can be trained and used to assess compounds even before they are synthesized.
Even though it is generally recognized that QSAR model performance
deteriorates over time, rigorous independent studies of model performance
deterioration is typically hindered by the lack of publicly available
large data sets of structurally diverse compounds. Here, we investigated
predictive properties of QSAR models derived from an assembly of publicly
available human liver microsomal (HLM) stability data using variable
nearest neighbor (v-NN) and random forest (RF) methods.
In particular, we evaluated the degree of time-dependent model performance
deterioration. Our results show that when evaluated by 10-fold cross-validation
with all available HLM data randomly distributed among 10 equal-sized
validation groups, we achieved high-quality model performance from
both machine-learning methods. However, when we developed HLM models
based on when the data appeared and tried to predict data published
later, we found that neither method produced predictive models and
that their applicability was dramatically reduced. On the other hand,
when a small percentage of randomly selected compounds from data published
later were included in the training set, performance of both machine-learning
methods improved significantly. The implication is that 1) QSAR model
quality should be analyzed in a time-dependent manner to assess their
true predictive power and 2) it is imperative to retrain models with any up-to-date experimental data to ensure maximum applicability.
创建时间:
2016-02-13



