five

Systematic Investigation of the Data Set Dependency of Protein Stability Predictors

收藏
NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Systematic_Investigation_of_the_Data_Set_Dependency_of_Protein_Stability_Predictors/12857522
下载链接
链接失效反馈
官方服务:
资源简介:
Prediction of protein stability changes caused by mutation is of major importance to protein engineering and for understanding protein misfolding diseases and protein evolution. The major limitation to these applications is the fact that different prediction methods vary substantially in terms of performance for specific proteins; i.e., performance is not transferable from one type of mutation or protein to another. In this study, we investigated the performance and transferability of eight widely used methods. We first constructed a new data set composed of 2647 mutations using strict selection criteria for the experimental data and then defined a variety of subdata sets that are unbiased with respect to various aspects such as mutation type, stabilization extent, structure type, and solvent exposure. Benchmarking the methods against these subdata sets enabled us to systematically investigate how data set biases affect predictor performance. In particular, we use a reduced amino acid alphabet to quantify the bias toward mutation type, which we identify as the major bias in current approaches. Our results show that all prediction methods exhibit large biases, stemming not from failures of the models applied but mostly from the selection biases of experimental data used for training or parametrization. Our identification of these biases and the construction of new mutation-type-balanced data should lead to the development of more balanced and transferable prediction methods in the future.
创建时间:
2020-08-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作