sharedata-to-reproduce-LACL
收藏DataCite Commons2023-10-26 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/sharedata-to-reproduce-LACL/24445129/1
下载链接
链接失效反馈官方服务:
资源简介:
You can download datasets used in the paper and extract the zip file under <code>./data</code> folder in the code base repository https://github.com/parkyjmit/LACL. Both QM9 and QMugs should be saved in the folder under their name. Conformations of all the data is pickled after preprocessing.QM9<code>qm9_all.pickle</code><br>List of dictionaries with properties. One dictionary corresponds to one molecule. It also contains cartesian coordinates of MMFF conformations and MMFF potential.<br><code>qm9_all_cgcf.pkl</code><br>List of rdkit molecules with cartesian coordinates of CGCF-ConfGen conformations. They were calculated by the official implement of CGCF-ConfGen.QMugs<code>QMugs_20_energy.pkl</code><br>List of dataframes containing identifiers, properties, SMILES, and rdkit mols with less than or equal to 20 number of heavy atoms.<br><code>QMugs_20_energy_mmff.pkl</code><br>List of rdkit molecules with cartesian coordinates of MMFF conformations. They were calculated by rdkit MMFF optimization.<br><code>QMugs_20_energy_cgcf.pkl</code><br>List of rdkit molecules with cartesian coordinates of CGCF-ConfGen conformations. They were calculated by the official implement of CGCF-ConfGen.<code>QMugs_{num}_energy_test.pkl</code><br>List of dataframes containing identifiers, properties, SMILES, and rdkit mols with <code>num</code> number of heavy atoms. <code>ex</code> means mols with more than 40 heavy atoms<br><code>QMugs_{num}_energy_mmff.pkl</code><br>List of rdkit molecules including <code>num</code> number of heavy atoms with cartesian coordinates of MMFF conformations.
可从代码仓库https://github.com/parkyjmit/LACL下载本论文所使用的数据集,并将压缩包解压至代码库的<code>./data</code>目录下。QM9与QMugs数据集需分别以各自名称存入该目录。
所有数据集的分子构象经预处理后均以Python pickle序列化格式(pickle)存储。
QM9数据集:
<code>qm9_all.pickle</code>:包含分子属性的字典列表,每个字典对应一个分子,同时存储了MMFF构象的笛卡尔坐标与MMFF势能。
<code>qm9_all_cgcf.pkl</code>:rdkit分子列表,包含CGCF-ConfGen构象的笛卡尔坐标,该构象通过CGCF-ConfGen官方实现计算得到。
QMugs数据集:
<code>QMugs_20_energy.pkl</code>:包含标识符、分子属性、SMILES字符串以及重原子数不超过20的rdkit分子的数据框列表。
<code>QMugs_20_energy_mmff.pkl</code>:rdkit分子列表,包含MMFF构象的笛卡尔坐标,该构象通过rdkit的MMFF优化算法计算得到。
<code>QMugs_20_energy_cgcf.pkl</code>:rdkit分子列表,包含CGCF-ConfGen构象的笛卡尔坐标,该构象通过CGCF-ConfGen官方实现计算得到。
<code>QMugs_{num}_energy_test.pkl</code>:包含标识符、分子属性、SMILES字符串以及重原子数为<code>num</code>的rdkit分子的数据框列表。其中<code>ex</code>代表重原子数超过40的分子。
<code>QMugs_{num}_energy_mmff.pkl</code>:rdkit分子列表,包含重原子数为<code>num</code>的MMFF构象的笛卡尔坐标。
提供机构:
figshare
创建时间:
2023-10-26



