Pool PaRTI protein sequence embeddings and residue importance scores for ESM-2 650M and protBERT
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15036724
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is version 2 of zenodo.org/records/14080821 for the paper titled "Pool PaRTI: A PageRank-Based Pooling Method for Identifying Critical Residues and Enhancing Protein Sequence Representations."
For two different PLMs (ESM-2 650M and protBERT) and more than 20,000 proteins on UniProt (encapsulating all Homo sapiens proteins), we present
1) the protein sequence embeddings generated by Pool PaRTI
2) the importance weights assigned to each residue of every protein by Pool PaRTI in the npz files.
The individual proteins are indexed by their UniProt accession codes. If you need to generate sequence embeddings or get residue importance values for sequences not in the dataset, please follow the repo with the link below to generate the desired output.
github.com/Helix-Research-Lab/Pool_PaRTI.git
You can also reach out to the authors for any clarification.
创建时间:
2025-03-17



