five

Representations and associated fitness values of protein sequences

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10907604
下载链接
链接失效反馈
官方服务:
资源简介:
We studied the ability to predict protein fitness from sequence using our method MERGE and other methods (i.e., ECNet, eUniRep, EVmutation, One-Hot, and UniRep). The following files are included in this repository: The folder ECNet contains the braw, csv, and fasta files required to run ECNet. The folders eUniRep, MERGE, One-Hot, and UniRep contain csv files including the names and fitness values of protein variants as well as a numerical representation of their sequence. The folder MERGE also contains representations of the wild type sequences as npy files in the wts folder. The folder Params contains params files generated with PLMC. The script get_performances.py enables to generate models using different methods (i.e., eUniRep, EVmutation, MERGE, One-Hot, Pure_ML, and UniRep) and to determine their performance for predicting the fitness of protein variants from sequence. References ECNet Luo, Y., Jiang, G., Yu, T. et al. ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat Commun 12, 5743 (2021). eUniRep Biswas, S., Khimulya, G., Alley, E.C. et al. Low-N protein engineering with data-efficient deep learning. Nat Methods 18, 389–396 (2021) EVmutation Hopf, T., Ingraham, J., Poelwijk, F. et al. Mutation effects predicted from sequence co-variation. Nat Biotechnol 35, 128–135 (2017). UniRep Alley, E.C., Khimulya, G., Biswas, S. et al. Unified rational protein engineering with sequence-based deep representation learning. Nat Methods 16, 1315–1322 (2019).
创建时间:
2024-04-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作