five

Data Quality in the Human and Environmental Health Sciences: Using Statistical Confidence Scoring to Improve QSAR/QSPR Modeling

收藏
NIAID Data Ecosystem2026-03-08 收录
下载链接:
https://figshare.com/articles/dataset/Data_Quality_in_the_Human_and_Environmental_Health_Sciences_Using_Statistical_Confidence_Scoring_to_Improve_QSAR_QSPR_Modeling/2138488
下载链接
链接失效反馈
官方服务:
资源简介:
A greater number of toxicity data are becoming publicly available allowing for in silico modeling. However, questions often arise as to how to incorporate data quality and how to deal with contradicting data if more than a single datum point is available for the same compound. In this study, two well-known and studied QSAR/QSPR models for skin permeability and aquatic toxicology have been investigated in the context of statistical data quality. In particular, the potential benefits of the incorporation of the statistical Confidence Scoring (CS) approach within modeling and validation. As a result, robust QSAR/QSPR models for the skin permeability coefficient and the toxicity of nonpolar narcotics to Aliivibrio fischeri assay were created. CS-weighted linear regression for training and CS-weighted root-mean-square error (RMSE) for validation were statistically superior compared to standard linear regression and standard RMSE. Strategies are proposed as to how to interpret data with high and low CS, as well as how to deal with large data sets containing multiple entries.
创建时间:
2016-02-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作