CheckMyBlob ligand data set (CMB)
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/1040842
下载链接
链接失效反馈官方服务:
资源简介:
Ligand data set prepared for the CheckMyBlob study, described in "Automatic recognition of ligands in electron density by machine learning methods" by Kowiel, M. et al. It contains only structures from X-ray diffraction experiments determined to at least 4.0 Å resolution. Entries with R factor above 0.3 or ligands below 0.3 occupancy (according to wwPDB validation reports) were rejected. Only ligands with at least 2 non-H atoms were considered and structures with low ligand map correlation coefficients (RSCC < 0.6, RSZO <= 1, RSZD > 6.0) were removed. Apart from taking into account quality factors, we removed from the experimental data set all moieties that are not considered proper ligands. These included: unknown species, water molecules, standard amino acids, and selected nucleotides. Moreover, connected ligands (as per the naming convention in the PDB) were labeled as alphabetically ordered strings of hetero-compound codes (e.g., NAG-NAG-NAG-NAG). Finally, the data set was limited to 200 most popular ligands. The resulting data set consisted of 219,986 examples with individual ligand counts ranging from 48,490 examples for SO4 (sulfate ion) to 106 for A2G (n-acetyl-2-deoxy-2-amino-galactose). More details concerning data selection can be found in the paper of Kowiel et al.
For machine learning (classification) purposes, the target attribute is: res_name.
创建时间:
2023-08-08



