five

Pre-computed Compositional and Structural Features of Materials Project Time Split Data for use with Generative Materials Benchmarking Metrics

收藏
DataCite Commons2022-08-10 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/Compositional_and_Structural_Fingerprints_of_Materials_Project_Time_Split_Data_for_use_with_Generative_Materials_Benchmarking_Metrics/20444109
下载链接
链接失效反馈
官方服务:
资源简介:
Short Description This is a supporting dataset for <em><strong>matbench-genmetrics</strong></em> <sub>[docs]</sub> <sub>[repo]</sub>, a set of generative materials benchmarking metrics. It contains compositional and structural fingerprints. Additional files include space group number for each structure and the "heat map" values for the empirical distribution of modified Pettifor scale-encoded (1D periodic table) values. Fingerprints The compositional (Magpie) and structural (CrystalNN) fingerprints* are produced in <sub>fingerprint_snapshot.py</sub> using <em><strong>mp-time-split</strong></em> data <sub>[docs]</sub> <sub>[repo]</sub> <sub>[figshare]</sub> and are given in <em>comp_fingerprints.csv</em> (132 features) and <em>struct_fingerprints.csv</em> (61 features), respectively. Each has an additional column in the first position, <em>material_id,</em> which contains the <sub>Materials Project <em>material_id</em></sub>. So, in total there are 133 and 62 columns, respectively. There are 40476 entries, plus a header row with labels, so 40477 rows in total. The primary purpose of these datasets is to avoid repeating lengthy calculations each time a <em>matbench-genmetrics</em> benchmark is computed; thus, only the generated structures need to be featurized. The total runtime for the compositional and structural fingerprinting using 6 physical cores (12 virtual cores as determined by <em>multiprocessing.cpu_count()</em>) is approximately 50 minutes and 140 min, respectively. The benchmarks can be used with materials generative models such as <sub><em>xtal2png</em></sub><em>+</em><sub><em>Imagen</em></sub>. *The use of Magpie and CrystalNN featurizers are based on the coverage metric from CDVAE <sub>[repo]</sub> <sub>[paper].</sub> A small set of dummy data for testing purposes is also included (<em>dummy_comp_fingerprints.csv</em> and <em>dummy_struct_fingerprints.csv</em>) Space Group Number The first column of <em>space_group_number.csv</em> is <em>material_id</em>, same as above, and the second column is <em>space_group_number</em>, as determined by the <em>get_space_group_info()</em> method for each <em>pymatgen</em> <em>Structure</em> object. A corresponding dummy file is provided. For the generation of this dataset, see <em>validity_snapshot.py</em>. Modified Pettifor Scale Each of the <em>pymatgen</em> <em>Composition</em> objects are converted to the <em>fractional_composition</em> counterpart and then collectively summed via <em>np.sum()</em> to get the fractional prevalences of periodic elements across each of the datasets. The periodic elements are then encoded in the 1D periodic table called the modified Pettifor scale. The columns of <em>mod_petti_contributions.csv</em> are <em>symbol</em> as in element symbol, <em>mod_petti</em> as in the modified Pettifor scale value, and <em>contribution</em> as in the fractional contribution to the full dataset. A dummy file is also provided for testing and debugging purposes. For the generation of this dataset, see <em>validity_snapshot.py</em>. The mapping dictionary is a slightly modified version of ElMD's implementation.
提供机构:
figshare
创建时间:
2022-08-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作