Data Quality in the Human and Environmental Health Sciences: Using Statistical Confidence Scoring to Improve QSAR/QSPR Modeling
收藏NIAID Data Ecosystem2026-03-08 收录
下载链接:
https://figshare.com/articles/dataset/Data_Quality_in_the_Human_and_Environmental_Health_Sciences_Using_Statistical_Confidence_Scoring_to_Improve_QSAR_QSPR_Modeling/2138488
下载链接
链接失效反馈官方服务:
资源简介:
A greater number of toxicity data
are becoming publicly available
allowing for in silico modeling. However, questions often arise as
to how to incorporate data quality and how to deal with contradicting
data if more than a single datum point is available for the same compound.
In this study, two well-known and studied QSAR/QSPR models for skin
permeability and aquatic toxicology have been investigated in the
context of statistical data quality. In particular, the potential
benefits of the incorporation of the statistical Confidence Scoring
(CS) approach within modeling and validation. As a result, robust
QSAR/QSPR models for the skin permeability coefficient and the toxicity
of nonpolar narcotics to Aliivibrio fischeri assay were created. CS-weighted linear regression for training and
CS-weighted root-mean-square error (RMSE) for validation were statistically
superior compared to standard linear regression and standard RMSE.
Strategies are proposed as to how to interpret data with high and
low CS, as well as how to deal with large data sets containing multiple
entries.
创建时间:
2016-02-13



