A comparison of reduced coordinate sets for describing protein structure
收藏Figshare2016-01-18 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/dataset/A_comparison_of_reduced_coordinate_sets_for_describing_protein_structure/798825/1
下载链接
链接失效反馈官方服务:
资源简介:
This file contains supplementary material for the following publication: Title: A comparison of reduced coordinate sets for describing protein structure<br>Authors: Konrad Hinsen, Shuangwei Hu, Gerald R. Kneller, Antti J. Niemi<br>Journal: Journal of Chemical Physics <strong>139</strong>, 124115 (2013) DOI: 10.1063/1.4821598 It contains the software implementing the computations described in the article, the input dataset, the output datasets, and the figures. A detailed list is given below. <strong>Instructions for use</strong> The file software_and_datasets.ap is an HDF5 file that can be read with any HDF5-compatible software, including the free HDFView package (http://www.hdfgroup.org/hdf-java-html/hdfview/). HDFView can be used to inspect the tables contained in this file. Reading the protein structures requires software that understands the Mosaic data model (http://bitbucket.org/molsim/mosaic/). The file software_and_datasets.ap has been prepared using the ActivePapers framework (http://bitbucket.org/khinsen/active_papers_py) for reproducible research. It contains all the software used in the study that is described in the article, from the download of the protein structures to the generation of the figures. All software is written in the Python language. The ActivePapers framework keeps track of which data was generated using which script and also notes the user name, machine name, and versions of all software packages used when running each script. Readers wishing to modify and re-run the scripts, or to run the on different input data, should download and install the ActivePapers framework. <strong>Datasets contained in the ActivePaper file</strong><br><br> <strong>/data/pdb_structures</strong> The orginal protein structures imported from the PDB. Only the backbone atoms relevant for our study (N, CA, C) are stored. There is one subgroup for each PDB code, and each subgroup contains a Mosaic universe and a Mosaic configuration. For more information about Mosaic, see https://bitbucket.org/molsim/mosaic/. <strong>/data/coordinate_analysis/</strong><br><strong> averages</strong><br><strong> variances</strong><br><strong> histograms</strong> The distributions of the internal coordinates of the backbone, computed over all residues of all protein structures. For each coordinate, there is an average value and a variance, and a histogram of the values. <strong>/data/reconstructions</strong> The protein backbone configurations reconstructed from various reduced coordinate sets. <strong>/data/number_of_residues</strong><br><strong>/data/root_mean_square_distances</strong><br><strong>/data/radii_of_gyration</strong> Tables containing for each protein the number of residues, the root-mean-square distances from each reconstruction to the initial structure, and the radii of gyration for the initial structure and all reconstructions. <strong>/data/rg_analysis</strong> The fits of the asymptotic large-N behavior of the radii of gyration for the initial configurations and all reconstructions. <strong>/documentation/</strong><br><strong> 2OVU_initial.pdb</strong><br><strong> 2OVU_ca.pdb</strong><br><strong> 2OVU_phipsi.pdb</strong> The PDB files for the initial configuration, the reconstruction from virtual-CA coordinates, and the reconstruction from phi-psi coordinates. <strong>/documentation/</strong><br><strong> histograms_distance.pdf</strong><br><strong> histograms_angle.pdf</strong><br><strong> histograms_dihedral.pdf</strong> The histograms of the internal coordinate values. <strong>/documentation/rmsd.pdf</strong> The plot of the RMSD distances between the reconstructions and the initial configurations. <strong>/documentation/rg.pdf</strong> The plots of the radii of gyration for each reconstruction, with the asymptotic fits.
本文件为以下发表成果的补充材料:<br>标题:用于描述蛋白质结构的约化坐标集对比研究<br>作者:Konrad Hinsen、Shuangwei Hu、Gerald R. Kneller、Antti J. Niemi<br>期刊:《The Journal of Chemical Physics》,2013年,第139卷,124115页<br>DOI:10.1063/1.4821598<br><br>本文件包含实现本文所述计算的软件、输入数据集、输出数据集与图表,详细清单如下。<br><br><strong>使用说明</strong><br>本文件`software_and_datasets.ap`为HDF5文件,可通过任意兼容HDF5的软件读取,包括免费的HDFView工具(http://www.hdfgroup.org/hdf-java-html/hdfview/),可用于检视本文件内的表格。读取蛋白质结构则需要支持Mosaic数据模型(Mosaic data model,http://bitbucket.org/molsim/mosaic/)的软件。<br><br>本文件基于可重复研究框架ActivePapers(http://bitbucket.org/khinsen/active_papers_py)制作,包含本文研究中使用的全部软件,从蛋白质结构下载到图表生成均覆盖,所有软件均采用Python语言编写。<br>ActivePapers框架可追踪记录各脚本生成数据的来源、运行该脚本时的用户名、机器名及所有软件包版本。若读者希望修改并重运行脚本,或基于其他输入数据运行分析,请下载并安装ActivePapers框架。<br><br><strong>ActivePaper文件包含的数据集</strong><br><br>- `/data/pdb_structures`:从蛋白质数据库(Protein Data Bank,PDB)导入的原始蛋白质结构,仅存储本研究关注的骨架原子(N、CA、C)。每个PDB编码对应一个子组,每个子组包含一个Mosaic宇宙(Mosaic universe)与一个Mosaic构型(Mosaic configuration)。有关Mosaic的更多信息,请参见https://bitbucket.org/molsim/mosaic/。<br>- `/data/coordinate_analysis/`:包含`averages`(均值)、`variances`(方差)与`histograms`(直方图),涵盖所有蛋白质结构全部残基的骨架内坐标分布,每种坐标均配有对应均值、方差及数值直方图。<br>- `/data/reconstructions`:基于各种约化坐标集重建的蛋白质骨架构型。<br>- `/data/number_of_residues`、`/data/root_mean_square_distances`(均方根距离)、`/data/radii_of_gyration`(回转半径):分别为包含每种蛋白质的残基数量、各重建构型与初始结构的均方根距离、初始结构及所有重建构型的回转半径的表格。<br>- `/data/rg_analysis`:针对初始构型与所有重建构型的回转半径的渐近大N行为拟合结果。<br>- `/documentation/`:<br> - `2OVU_initial.pdb`、`2OVU_ca.pdb`、`2OVU_phipsi.pdb`:分别为初始构型、基于虚拟CA坐标重建的构型、基于φ-ψ坐标重建的构型的PDB文件。<br> - `histograms_distance.pdf`、`histograms_angle.pdf`、`histograms_dihedral.pdf`:内坐标值的直方图。<br> - `rmsd.pdf`:重建构型与初始构型之间的均方根距离绘图。<br> - `rg.pdf`:各重建构型的回转半径绘图,附带渐近拟合曲线。
提供机构:
Shuangwei Hu
创建时间:
2013-09-13



