five

Data for HEAD_TED

收藏
DataCite Commons2025-06-01 更新2025-05-07 收录
下载链接:
https://figshare.com/articles/dataset/Data_for_HEAD_TED/27826488/3
下载链接
链接失效反馈
官方服务:
资源简介:
<pre># HEAD_TED Data Collections<br>## Collections Check List<br>- [x] Raw_conformations.tar.gz<br>- [x] Mininplace_conformations.tar.gz<br>- [x] GM-5K_min.sdf.tar.gz<br>- [x] GM-5K.csv<br>- [x] GM-1K_min.sdf.tar.gz<br>- [x] GM-1K.csv<br>- [x] DFT-5K.sdf.tar.gz<br>- [x] DFT-5K.csv<br><br>## Details<br>### Pre-optimization conformations &amp; Post-optimization conformations<br>Raw conformations generated by AI models without any optimization (i.e., pre-optimization) are provided in the file Raw_conformations.tar.gz. This archive contains five independent .sdf files, each corresponding to a different AI model used for generating the molecules.<br>Optimized conformations, which are raw conformations refined using OPLS3e force field with binding pockets fixed, are included in the file Mininplace_conformations.tar.gz. Similar to the raw conformations, this archive also contains five independent .sdf files, named after the AI models utilized for their generation.<br>Within each .sdf file, the headers contain unique indices for each conformation. It is important to note that these indices may be duplicated across different models, so care should be taken when referencing them to avoid confusion.<br><br>### GM-5K and GM-1K<br>The conformations in GM-5K and GM-1K are extracted from the raw conformations generated by AI models. <br>GM-5K and GM-1K are provided as `GM-5K_min.sdf.tar.gz` and `GM-1K_min.sdf.tar.gz`, respectively. These files contain raw conformations optimized by the MMFF94 force field with binding pockets fixed. <br><br>DFT single-point energies (before and after optimization), along with HEAD results, are included in the `.csv` files: `GM-1K.csv` and `GM-5K.csv`. The columns in these `.csv` files are explained as follows:<br>- **mol_id**: The name in the header of each molecule block in `.sdf` file.<br>- **model**: The model that generated the molecule.<br>- **pdb**: The PDB ID of the complex where the target pocket originates.<br>- **DFT_single_point_energy_raw**: Single-point energy (in kcal/mol) of the unoptimized raw conformation calculated using DFT method.<br>- **DFT_single_point_energy_min**: Single-point energy (in kcal/mol) of the conformation optimized by MMFF94, calculated using DFT method.<br>- **dE**: The energy difference, calculated as $\Delta E = E_{raw} - E_{opt}$, representing the change in energy between raw and optimized conformations.<br>- **HEAD_invalid_atoms**: Lists all atomic-level details if the conformation is detected as invalid. Otherwise, this field is `None`. For example:<br>    - `[(2, 'C', 40.962)]`: Indicates the second carbon atom (indexing starts at 1) is detected as invalid due to the high-energy response 40.962 kcal/mol (Note that, this energy is only a reference, that may not be precise)<br>- **HEAD_invalidity**: Indicates the validity of the conformation:<br>    - `0`: Valid conformation<br>    - `1`: Invalid conformation<br>    - `-1`: Unsupported conformation that may contain elements out of {H, C, N, O, F, S, Cl} or encounter unexpected error during loading<br>- **information_entropy_label**: A value of `1` indicates an invalid conformation detected **only** by the information entropy approach; otherwise, `0`.<br>- **PB_invalidity**: A value of `1` indicates an invalid conformation detected by PoseBusters; otherwise, `0`.<br><br>### DFT-5K<br>The conformations of DFT-5K dataset are provided in the `DFT-5K.sdf.tar.gz` archive. All molecules in this file have been optimized using the DFT method with appropriate constraints on specific dihedrals. Their single-point energies are calculated using a higher-level DFT method than used for optimization.<br><br>Each torsion fragment in DFT-5K is represented by 24 conformations, grouped under the same name in the header, corresponding to different dihedral angle values ranging from -180° to 180° in 15° increments (-180° is equivalent to 180°). Each molecule block in the `.sdf` file includes the following properties:<br>- **TORSION_ATOMS**: Indices of atom quartet defining the specific dihedral, starting from 0.<br>- **DIHEDRAL_ANGLE**: The degree of the dihedral being investigated.<br>- **DFT_SINGLE_POINT_ENERGY**: Single-point energy (in kcal/mol) calculated using DFT method.<br><br>In addition to the `.sdf` file, the dataset includes a `DFT-5K.csv` file containing the following columns:<br>- **DFT_id**: The name in the header of each molecule block in `.sdf` file. Conformations with different dihedral angles of a same torsion fragment share the same DFT_id.<br>- **xTB_dih_relative_energies**: A string of relative energies, joined by `-`, representing the energies of the conformations optimized with constraints and calculated using GFN2-xTB. The order of energies corresponds to dihedral angles from -180° to 180° in 15°  increments (24 values in total). **Note**: These are relative energies, calculated by subtracting the minimum energy value among the 24 conformations, different from the single-point energies listed in previous sections.<br>- **model_dih_relative_energies**: Similar to the above, but the relative energies are predicted by TED-Model.<br><br><br></pre>
提供机构:
figshare
创建时间:
2025-04-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作