Systematic Investigation of the Data Set Dependency of Protein Stability Predictors
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Systematic_Investigation_of_the_Data_Set_Dependency_of_Protein_Stability_Predictors/12857522
下载链接
链接失效反馈官方服务:
资源简介:
Prediction of protein stability changes
caused by mutation is of
major importance to protein engineering and for understanding protein
misfolding diseases and protein evolution. The major limitation to
these applications is the fact that different prediction methods vary
substantially in terms of performance for specific proteins; i.e.,
performance is not transferable from one type of mutation or protein
to another. In this study, we investigated the performance and transferability
of eight widely used methods. We first constructed a new data set
composed of 2647 mutations using strict selection criteria for the
experimental data and then defined a variety of subdata sets that
are unbiased with respect to various aspects such as mutation type,
stabilization extent, structure type, and solvent exposure. Benchmarking
the methods against these subdata sets enabled us to systematically
investigate how data set biases affect predictor performance. In particular,
we use a reduced amino acid alphabet to quantify the bias toward mutation
type, which we identify as the major bias in current approaches. Our
results show that all prediction methods exhibit large biases, stemming
not from failures of the models applied but mostly from the selection
biases of experimental data used for training or parametrization.
Our identification of these biases and the construction of new mutation-type-balanced
data should lead to the development of more balanced and transferable
prediction methods in the future.
创建时间:
2020-08-10



