Data for "How Natural Sequence Variation Modulates Protein Folding Dynamics"
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14547874
下载链接
链接失效反馈官方服务:
资源简介:
Description
This repository contains the data associated with the article:"How Natural Sequence Variation Modulates Protein Folding Dynamics"Authors: Ezequiel A. Galpern, Ernesto A. Roman, Diego U. FerreiroDOI: 10.48550/arXiv.2412.14341
The dataset includes:
Multiple Sequence Alignments (MSAs): Provided as .fasta files for 15 protein families.
Potts Models: Saved as Python dictionaries in .npz format, which include:
h: Local fields of the Potts model.
J: Couplings of the Potts model.
This dataset is designed to support reproducibility and further exploration of the findings presented in the article.
Data Structure
/simplified_rbm_and_msa/: Contains a folder for each of the 15 protein families. Each folder includes:
A .fasta file for the multiple sequence alignment (MSA) of the protein family.
A .npz file containing the Potts model, with local fields (h) and couplings (J), saved in NumPy format.
File Format Details
MSA Files
Format: .fasta
Example Usage: Load using any standard MSA tool or Python libraries such as Biopython.
Potts Model Files
Format: .npz (NumPy compressed archive)
Contents:
h: Local fields, accessible as potts['h'].
J: Couplings, accessible as potts['J'].
Example Usage:
import numpy as np
potts = np.load("potts_file.npz")
h = potts['h'] # Local fields
J = potts['J'] # Couplings
Usage Instructions
To use this dataset, refer to the corresponding GitHub repository, which includes:
Codebase: All scripts required to process and analyze the data.
Demonstration Notebook: A ready-to-run Jupyter Notebook for Google Colab.
GitHub Repository
Access the repository here: github.com/eagalpern/folding-ising-globular
Citing This Dataset
If you use this dataset, please cite:
Galpern, E. A., Roman, E. A., & Ferreiro, D. U. (2024). "How Natural Sequence Variation Modulates Protein Folding Dynamics". arXiv. DOI: 10.48550/arXiv.2412.14341.
创建时间:
2024-12-23



