Redocking the PDB

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/7579501

下载链接

链接失效反馈

官方服务：

资源简介：

This repository contains supplementary data to the journal article 'Redocking the PDB' by Flachsenberg et al. (https://doi.org/10.1021/acs.jcim.3c01573)[1]. In this paper, we described two datasets: The PDBScan22 dataset with a large set of 322,051 macromolecule–ligand binding sites generally suitable for redocking and the PDBScan22-HQ dataset with 21,355 binding sites passing different structure quality filters. These datasets were further characterized by calculating properties of the ligand (e.g., molecular weight), properties of the binding site (e.g., volume), and structure quality descriptors (e.g., crystal structure resolution). Additionally, we performed redocking experiments with our novel JAMDA structure preparation and docking workflow[1] and with AutoDock Vina[2,3]. Details for all these experiments and the dataset composition can be found in the journal article[1]. Here, we provide all the datasets, i.e., the PDBScan22 and PDBScan22-HQ datasets as well as the docking results and the additionally calculated properties (for the ligand, the binding sites, and structure quality descriptors). Furthermore, we give a detailed description of their content (i.e., the data types and a description of the column values). All datasets consist of CSV files with the actual data and associated metadata JSON files describing their content. The CSV/JSON files are compliant with the CSV on the web standard (https://csvw.org/). General hints All docking experiment results consist of two CSV files, one with general information about the docking run (e.g., was it successful?) and one with individual pose results (i.e., score and RMSD to the crystal structure). All files (except for the docking pose tables) can be indexed uniquely by the column tuple '(pdb, name)' containing the PDB code of the complex (e.g., 1gm8) and the name ligand (in the format '__', e.g., 'SOX_B_1559'). All files (except for the docking pose tables) have exactly the same number of rows as the dataset they were calculated on (e.g., PDBScan22 or PDBScan22-HQ). However, some CSV files may have missing values (see also the JSON metadata files) in some or even all columns (except for 'pdb' and 'name'). The docking pose tables also contain the 'pdb' and 'name' columns. However, these alone are not unique but only together with the 'rank' column (i.e., there might be multiple poses for each docking run or none). Example usage Using the pandas library (https://pandas.pydata.org/) in Python, we can calculate the number of protein-ligand complexes in the PDBScan22-HQ dataset with a top-ranked pose RMSD to the crystal structure ≤ 2.0 Å in the JAMDA redocking experiment and a molecular weight between 100 Da and 200 Da: import pandas as pd df = pd.read_csv('PDBScan22-HQ.csv') df_poses = pd.read_csv('PDBScan22-HQ_JAMDA_NL_NR_poses.csv') df_properties = pd.read_csv('PDBScan22_ligand_properties.csv') merged = df.merge(df_properties, how='left', on=['pdb', 'name']) merged = merged[(merged['MW'] >= 100) & (merged['MW'] <= 200)].merge(df_poses[df_poses['rank'] == 1], how='left', on=['pdb', 'name']) nof_successful_top_ranked = (merged['rmsd_ai'] <= 2.0).sum() nof_no_top_ranked = merged['rmsd_ai'].isna().sum() Datasets PDBScan22.csv: This is the PDBScan22 dataset[1]. This dataset was derived from the PDB[4] (PDB version March 11th, 2022). It contains macromolecule–ligand binding sites (defined by PDB code and ligand identifier) that can be read by the NAOMI library[5,6] and pass basic consistency filters. PDBScan22-HQ.csv: This is the PDBScan22-HQ dataset[1]. It contains macromolecule–ligand binding sites from the PDBScan22 dataset that pass certain structure quality filters described in our publication[1]. PDBScan22-HQ-ADV-Success.csv: This is a subset of the PDBScan22-HQ dataset without 336 binding sites where AutoDock Vina[2,3] fails. PDBScan22-HQ-Macrocycles.csv: This is a subset of the PDBScan22-HQ dataset without 336 binding sites where AutoDock Vina[2,3] fails and only contains molecules with macrocycles with at least ten atoms. Properties for PDBScan22 PDBScan22_ligand_properties.csv: Conformation-independent properties of all ligand molecules in the PDBScan22 dataset. Properties were calculated using an in-house tool developed with the NAOMI library[5,6]. PDBScan22_StructureProfiler_quality_descriptors.csv: Structure quality descriptors for the binding sites in the PDBScan22 dataset calculated using the StructureProfiler tool[7]. PDBScan22_basic_complex_properties.csv: Simple properties of the binding sites in the PDBScan22 dataset. Properties were calculated using an in-house tool developed with the NAOMI library[5,6]. Properties for PDBScan22-HQ PDBScan22-HQ_DoGSite3_pocket_descriptors.csv: Binding site descriptors calculated for the binding sites in the PDBScan22-HQ dataset using the DoGSite3 tool[8]. PDBScan22-HQ_molecule_types.csv: Assignment of ligands in the PDBScan22-HQ dataset (without 336 binding sites where AutoDock Vina fails) to different molecular classes (i.e., drug-like, fragment-like oligosaccharide, oligopeptide, cofactor, macrocyclic). A detailed description of the assignment can be found in our publication[1]. Docking results on PDBScan22 PDBScan22_JAMDA_NL_NR.csv: Docking results of JAMDA[1] on the PDBScan22 dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22_JAMDA_NL_NR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22_JAMDA_NL_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22 dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. Docking results on PDBScan22-HQ PDBScan22-HQ_JAMDA_NL_NR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NL_NR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NL_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NL_WR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NL_WR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_NL_WR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_NW_NR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NW_NR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NW_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NW_WR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NW_WR_poses.csv'. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_NW_WR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_WL_NR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_WL_NR_poses.csv'. For this experiment, the ligand was considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_WL_NR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_WL_WR.csv: Docking results of JAMDA[1] on the PDBScan22-HQ dataset. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_WL_WR_poses.csv'. For this experiment, the ligand was considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_JAMDA_WL_WR_poses.csv: Pose scores and RMSDs for the docking results of JAMDA[1] on the PDBScan22-HQ dataset. For this experiment, the ligand was considered during preprocessing of the binding site, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was enabled. PDBScan22-HQ_AutoDockVina.csv: Docking results of AutoDock Vina[2,3] on the PDBScan22-HQ dataset. This is the general overview for the docking runs, the pose results are given in 'PDBScan22-HQ_AutoDockVina_poses.csv'. The preprocessing of structures was performed using the JAMDA preprocessing pipeline[1]. For this experiment, the ligand was not considered during preprocessing of the binding site, and all water molecules were removed from the binding site during preprocessing. PDBScan22-HQ_AutoDockVina_poses.csv: Pose scores and RMSDs for the docking results of AutoDock Vina[2,3] on the PDBScan22-HQ dataset. The preprocessing of structures was performed using the JAMDA preprocessing pipeline[1]. For this experiment, the ligand was not considered during preprocessing of the binding site, and all water molecules were removed from the binding site during preprocessing. PDBScan22-HQ-Macrocycles_AutoDockVinaMC.csv: Docking results of AutoDock Vina with macrocycle sampling[2,3] on the PDBScan22-HQ subset with macrocyclic molecules (see 'PDBScan22-HQ-Macrocycles.csv'). This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ-Macrocycles_AutoDockVinaMC_poses.csv'. The preprocessing of structures was performed using the JAMDA preprocessing pipeline[1]. For this experiment, the ligand was not considered during preprocessing of the binding site, and all water molecules were removed from the binding site during preprocessing. PDBScan22-HQ-Macrocycles_AutoDockVinaMC_poses.csv: Pose scores and RMSDs for the docking results of AutoDock Vina[2,3] with enabled macrocycle sampling on the PDBScan22-HQ subset with macrocyclic molecules (see 'PDBScan22-HQ-Macrocycles.csv'). The preprocessing of structures was performed using the JAMDA preprocessing pipeline[1]. For this experiment, the ligand was not considered during preprocessing of the binding site, and all water molecules were removed from the binding site during preprocessing. Docking with consensus scoring results on PDBScan22-HQ PDBScan22-HQ_JAMDA_NW_NR_Consensus.csv: Docking and consensus scoring results of JAMDA[1] on the PDBScan22-HQ dataset (without the 336 binding sites where AutoDock Vina docking fails). Here, the docking was performed with JAMDA and a rescoring (with and without optimization) was performed with AutoDock Vina[2,3]. From the JAMDA pose scores and the AutoDock Vina scores, a consensus score was calculated with the rank-by-rank scheme[9]. This is the general overview for the docking runs; the pose results are given in 'PDBScan22-HQ_JAMDA_NW_NR_Consensus_poses.csv' (AutoDock Vina scoring without optimization) and 'PDBScan22-HQ_JAMDA_NW_NR_ConsensusOpt_poses.csv' (AutoDock Vina with short numerical optimization). For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NW_NR_Consensus_poses.csv: Pose and consensus scores and RMSDs for the docking results of JAMDA[1] and the consensus scoring on the PDBScan22-HQ dataset (without the 336 binding sites where AutoDock Vina docking fails). Here, the docking was performed with JAMDA, and a rescoring without optimization was performed with AutoDock Vina[2,3]. From the JAMDA pose score and the AutoDock Vina score, a consensus score was calculated with the rank-by-rank scheme[9]. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. PDBScan22-HQ_JAMDA_NW_NR_ConsensusOpt_poses.csv: Pose and consensus scores and RMSDs for the docking results of JAMDA[1] and the consensus scoring on the PDBScan22-HQ dataset (without the 336 binding sites where AutoDock Vina docking fails). Here, the docking was performed with JAMDA, and a rescoring with short numerical optimization was performed with AutoDock Vina[2,3]. From the JAMDA pose score and the optimized AutoDock Vina score, a consensus score was calculated with the rank-by-rank scheme[9]. For this experiment, the ligand was not considered during preprocessing of the binding site, all water molecules were removed from the binding site during preprocessing, and the binding site restriction mode (i.e., biasing the docking towards the crystal ligand position) was disabled. References Flachsenberg, F.; Ehrt, C.; Gutermuth, T.; Rarey, M. Redocking the PDB. J. Chem. Inf. Model., 2023, https://doi.org/10.1021/acs.jcim.3c01573 Eberhardt, J.; Santos-Martins, D.; Tillack, A. F.; Forli, S.; AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J. Chem. Inf. Model., 2021, 61, pp 3891–3898, https://doi.org/10.1021/acs.jcim.1c00203 Trott, O.; Olson, A. J.; AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem., 2010, 31, pp 455-461, https://doi.org/10.1002/jcc.21334 Berman, H. M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T. N.; Weissig, H.; Shindyalov, I. N.; Bourne, P. E. The Protein Data Bank, Nucleic Acids Res., 2000, 28, pp 235–242, https://doi.org/10.1093/nar/28.1.235 Urbaczek, S.; Kolodzik, A.; Fischer, J. R.; Lippert, T.; Heuser, S.; Groth, I.; Schulz-Gasch, T.; Rarey, M. NAOMI: On the Almost Trivial Task of Reading Molecules from Different File formats. J. Chem. Inf. Model., 2011, 51, pp 3199–3207, https://doi.org/10.1021/ci200324e Urbaczek, S.; Kolodzik, A; Groth, I.; Heuser, S.; Rarey, M. Reading PDB: Perception of Molecules from 3D Atomic Coordinates. J. Chem. Inf. Model., 2013, 53, 1, 76–87, https://doi.org/10.1021/ci300358c Meyder, A.; Kampen, S.; Sieg, J.; Fährrolfes, R.; Friedrich, N.; Flachsenberg, F.; Rarey, M. StructureProfiler: an all-in-one tool for 3D protein structure profiling. Bioinformatics, 2019, 35, pp 874–876, https://doi.org/10.1093/bioinformatics/bty692 Graef, J.; Ehrt, C.; Rarey, M. Binding Site Detection Remastered: Enabling Fast, Robust, and Reliable Binding Site Detection and Descriptor Calculation with DoGSite3. J. Chem. Inf. Model., 2023, 63, pp 3128–3137, https://doi.org/10.1021/acs.jcim.3c00336 Wang, R.; Wang, S. How Does Consensus Scoring Work for Virtual Library Screening? An Idealized Computer Experiment. J. Chem. Inf. Comput. Sci., 2001, 41, pp 1422–1426, https://doi.org/10.1021/ci010025x

创建时间：

2023-12-06

5,000+

优质数据集

54 个

任务类型

进入经典数据集