Representations and associated fitness values of protein sequences
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10907604
下载链接
链接失效反馈官方服务:
资源简介:
We studied the ability to predict protein fitness from sequence using our method MERGE and other methods (i.e., ECNet, eUniRep, EVmutation, One-Hot, and UniRep). The following files are included in this repository:
The folder ECNet contains the braw, csv, and fasta files required to run ECNet.
The folders eUniRep, MERGE, One-Hot, and UniRep contain csv files including the names and fitness values of protein variants as well as a numerical representation of their sequence. The folder MERGE also contains representations of the wild type sequences as npy files in the wts folder.
The folder Params contains params files generated with PLMC.
The script get_performances.py enables to generate models using different methods (i.e., eUniRep, EVmutation, MERGE, One-Hot, Pure_ML, and UniRep) and to determine their performance for predicting the fitness of protein variants from sequence.
References
ECNet
Luo, Y., Jiang, G., Yu, T. et al. ECNet is an evolutionary context-integrated deep learning framework for protein engineering. Nat Commun 12, 5743 (2021).
eUniRep
Biswas, S., Khimulya, G., Alley, E.C. et al. Low-N protein engineering with data-efficient deep learning. Nat Methods 18, 389–396 (2021)
EVmutation
Hopf, T., Ingraham, J., Poelwijk, F. et al. Mutation effects predicted from sequence co-variation. Nat Biotechnol 35, 128–135 (2017).
UniRep
Alley, E.C., Khimulya, G., Biswas, S. et al. Unified rational protein engineering with sequence-based deep representation learning. Nat Methods 16, 1315–1322 (2019).
创建时间:
2024-04-23



