Learning protein fitness models from evolutionary and assay-labeled data

DataONE2021-12-15 更新2025-05-31 收录

下载链接：

https://search.dataone.org/view/sha256:8f2851442bd3edb905e2743f083cf835e2f9735babeaed79e019bb0896d0ef87

下载链接

链接失效反馈

官方服务：

资源简介：

Machine learning-based models of protein fitness typically learn from either unlabeled, evolutionarily-related sequences, or variant sequences with experimentally measured labels. For regimes where only limited experimental data are available, recent work has suggested methods for combining both sources of information. Toward that goal, we propose a simple combination approach that is competitive with, and on average outperforms more sophisticated methods. Our approach uses ridge regression on site-specific amino acid features combined with one density feature from modelling the evolutionary data. Within this approach, we find that a variational autoencoder-based density model showed the best overall performance, although any evolutionary density model can be used. Moreover, our analysis highlights the importance of systematic evaluations and suï¬cient baselines.

创建时间：

2025-05-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集