five

The Effect of Debiasing Protein–Ligand Binding Data on Generalization

收藏
Figshare2019-12-11 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/The_Effect_of_Debiasing_Protein_Ligand_Binding_Data_on_Generalization/11446143
下载链接
链接失效反馈
官方服务:
资源简介:
The structured nature of chemical data means machine-learning models trained to predict protein–ligand binding risk overfitting the data, impairing their ability to generalize and make accurate predictions for novel candidate ligands. Data debiasing algorithms, which systematically partition the data to reduce bias and provide a more accurate metric of model performance, have the potential to address this issue. When models are trained using debiased data splits, the reward for simply memorizing the training data is reduced, suggesting that the ability of the model to make accurate predictions for novel candidate ligands will improve. To test this hypothesis, we use distance-based data splits to measure how well a model can generalize. We first confirm that models perform better for randomly split held-out sets than for distant held-out sets. We then debias the data and find, surprisingly, that debiasing typically reduces the ability of models to make accurate predictions for distant held-out test sets and that model performance measured after debiasing is not representative of the ability of a model to generalize. These results suggest that debiasing reduces the information available to a model, impairing its ability to generalize.
创建时间:
2019-12-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作