Dataset of AlphaFold's internal representations of 4,581 proteins relevant for drug discovery
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/10671260
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the outputs of the AlphaFold model for 4,581 proteins that are relevant targets in drug discovery.
More information on the dataset can be found at the following repository:
Dataset structure:
↓ data/* -> main data directory
↓ data/PID/* -> data of a single protein of length L
Filename
Description
Tensor shape
Lightweight
single.npy
( s i ) evoformer single representation
[L x 384]
✔️
structure.npy
( a i ) output of the last layer of structure module
[L x 384]
✔️
msa.npy***
( m s i ) processed MSA representation
[N x L x 256]
pair.npy***
( z i j ) evoformer pair representation
[L x L x 128]
PID.pdb
3D protein structure prediction
✔️
PID_unrelaxed.pdb
3D protein structure prediction w/o relaxation step (D)
✔️
confidence.npy*
confidence in structure prediction (0-100)
1
✔️
plldt.npy*
confidence in structure prediction per residue
[L]
✔️
PID.fasta
protein amino acid sequence and metadata
✔️
timings.json
Processing log
✔️
↓ data/PID2/* -> data of protein #2
...
*Note: L: sequence length, N: number of aligned sequences via MSA.
创建时间:
2024-11-14



