five

Learning protein fitness models from evolutionary and assay-labeled data

收藏
DataONE2021-12-15 更新2025-05-31 收录
下载链接:
https://search.dataone.org/view/sha256:8f2851442bd3edb905e2743f083cf835e2f9735babeaed79e019bb0896d0ef87
下载链接
链接失效反馈
官方服务:
资源简介:
Machine learning-based models of protein fitness typically learn from either unlabeled, evolutionarily-related sequences, or variant sequences with experimentally measured labels. For regimes where only limited experimental data are available, recent work has suggested methods for combining both sources of information. Toward that goal, we propose a simple combination approach that is competitive with, and on average outperforms more sophisticated methods. Our approach uses ridge regression on site-specific amino acid features combined with one density feature from modelling the evolutionary data. Within this approach, we find that a variational autoencoder-based density model showed the best overall performance, although any evolutionary density model can be used. Moreover, our analysis highlights the importance of systematic evaluations and sufficient baselines.
创建时间:
2025-05-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作