FireProtDB + PDB Structural Protein Stability Dataset
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8169288
下载链接
链接失效反馈官方服务:
资源简介:
Dataset compiled and curated for use in the ThermoMPNN paper: https://doi.org/10.1073/pnas.2314853121:
Dataset for training models for prediction of thermodynamic stability changes (ddG) of protein point mutations given a wildtype protein structure (PDB) file. Data was assembled by matching sequence-based ddG measurements in FireProtDB to structures from the RCSB Protein Data Bank (PDB). For details, see the Methods section of our manuscript.
Citing this work: If you choose to use this dataset for your own research, please cite this repository and the ThermoMPNN paper: https://doi.org/10.1073/pnas.2314853121.
Contents:
pdbs/ directory contains all PDB files
csvs/ directory contains all CSVs with mutation data
csvs/4_fireprotDB_bestpH.csv is the main (full) dataset file with 3,438 mutations across 100 proteins.
csvs/fireprot_splits.pkl contains the dataset splits (train/val/test) used in our study
csvs/splits/ contains csvs for each of the splits (train/val/test/homologue-free) indexed from the full dataset csv.
Important CSV columns:
pdb_id_corrected: corresponds to the PDB in the pdbs/ directory (after curation and disambiguation)
ddG: ddG value for mutation (mutant - WT)
wild_type: wild-type amino acid (1-letter code)
mutation: mutant amino acid (1-letter code)
pdb_position: 0-based index of the mutated residue in the PDB file (may be different from position in the original FireProtDB sequence entry)
创建时间:
2024-01-30



