Project files provided as supporting information to the manuscript "Information-theoretical measures identify accurate low-resolution representations of protein configurational space"

NIAID Data Ecosystem2026-03-13 收录

下载链接：

https://zenodo.org/record/6554497

下载链接

链接失效反馈

官方服务：

资源简介：

The dataset contains the following compressed folder: -Notebooks.zip: This folder contains: -python_script: -RESREL.py: script performing the clusterization and computing the relevance resolution curves -random_curves.py: script generating the random value and computing the corresponding RES-REV curves_s -Cluster_distance_matrix.py: script returning the distance among clusters for a given partition. -python_notebook: -Exploratory_analysis.ipynb: Analysis performed on the 12-protein_dataset -DMAPS_ANTI.ipynb: Diffusion Map for the Antibody -DMAPS_COV_1ake.ipynb: Diffusion Map + Inter-Intra state decomposition of covariance for 1ake Packages required for the usage of these python scripts/notebooks: -numpy -pandas -matplotlib -seaborn -multiprocessing -scipy ======== RAW DATA ======== The raw data produced and employed in this study are available on a Google Drive folder at the following address: https://drive.google.com/drive/folders/1PasAUCgpR5-gdzUVEdyusgZIayQN0Le9 In this folder, together with the compressed Notebooks.zip folder, one can fin the compressed folder Data.zip, within which the following data are present: -12-protein_dataset: -md.mdp: the .mdp file used in the MD simulations -PROTEIN_PDB_CODE: -Hk_{sel}.npy & Hs_{sel}.npy: the Rel & Res curves, sel=[all, CA, CB] -RMSD_{sel}.npy: the RMSD matrix, sel=[all, CA, CB] -npt.gro:protein+water+ions structure @TEO the equilibration (NVT+NPT) -MSR_df.csv: a dataset containing the following columns 'area' : area behind the Relevance-Resolution curve; 'selection': the atomic selection (['all', 'CA', 'CB']) used to compute the RMSD matrix used for the clusterization (and consequently the Relevance-Resolution curves) 'method': the linkage measure used in the clustering procedure, an integer in [0,6]; 'method_name': the linkage measure used in the clustering procedure, a string in ['average','ward','complete','single','centroid','median','weighted']; 'rmsd_mean': the mean value of the rmsd vector along the trajectory computed wrt the first frame; 'rmsd_var': the variance of the rmsd vector along the trajectory computed wrt the first frame; 'rgy_mean': the mean value of the radius of gyration along the trajectory; 'rgy_var': the variance of the radius of gyration along the trajectory; 'rmsf_mean': the mean value of the rmsf; 'rmsf_var': the variance of the rmsf; 'RMSD_M_mean': the mean value of the RMSD matrix. 'RMSD_M_var': the variance of the RMSD matrix. -Random: -curves.npy= 100K Relevance-Resolution Random curves for M=40001 -curves_s.npy= 100K Relevance-Resolution Random curves for M=15000 -validation_dataset: -antibody: -Hk_CB.npy & Hs_CB.npy: the Rel & Res curves -RMSD_CB.npy: the RMSD matrix -DIFF_{M}.npy: the eigenvalue/vector of the 10-D diffusion space -Label_{method}.npy: the label vector for n_clusters -1ake: -Hk_{sel}.npy & Hs_{sel}.npy: the Rel & Res curves -RMSD_{sel}.npy: the RMSD matrix -DIFF_{M}.npy: the eigenvalue/vector of the 10-D diffusion space -Label_{method}.npy: the label vector for n_clusters -intra_{m}.npy: the intra-cluster covariance matrix -inter_cov_{m}.npy: the inter-cluster correlation matrix NOTE ===== The matrices of the cluster distances for adenylate kinase and antibody have been computed through the script Cluster_distance_matrix.py. These matrices have not been included in the dataset because of their large size; the raw data are however available upon request.

创建时间：

2022-05-17