five

FireProtDB + PDB Structural Protein Stability Dataset

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8169288
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset compiled and curated for use in the ThermoMPNN paper: https://doi.org/10.1073/pnas.2314853121:  Dataset for training models for prediction of thermodynamic stability changes (ddG) of protein point mutations given a wildtype protein structure (PDB) file. Data was assembled by matching sequence-based ddG measurements in FireProtDB to structures from the RCSB Protein Data Bank (PDB). For details, see the Methods section of our manuscript. Citing this work: If you choose to use this dataset for your own research, please cite this repository and the ThermoMPNN paper: https://doi.org/10.1073/pnas.2314853121.   Contents: pdbs/ directory contains all PDB files csvs/ directory contains all CSVs with mutation data csvs/4_fireprotDB_bestpH.csv is the main (full) dataset file with 3,438 mutations across 100 proteins. csvs/fireprot_splits.pkl contains the dataset splits (train/val/test) used in our study csvs/splits/ contains csvs for each of the splits (train/val/test/homologue-free) indexed from the full dataset csv. Important CSV columns: pdb_id_corrected: corresponds to the PDB in the pdbs/ directory (after curation and disambiguation) ddG: ddG value for mutation (mutant - WT) wild_type: wild-type amino acid (1-letter code) mutation: mutant amino acid (1-letter code) pdb_position: 0-based index of the mutated residue in the PDB file (may be different from position in  the original FireProtDB sequence entry)
创建时间:
2024-01-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作