five

Pre-computed Compositional and Structural Features of Materials Project Time Split Data for use with Generative Materials Benchmarking Metrics

收藏
Figshare2022-08-06 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Compositional_and_Structural_Fingerprints_of_Materials_Project_Time_Split_Data_for_use_with_Generative_Materials_Benchmarking_Metrics/20444109
下载链接
链接失效反馈
官方服务:
资源简介:
Short Description This is a supporting dataset for matbench-genmetrics [docs] [repo], a set of generative materials benchmarking metrics. It contains compositional and structural fingerprints. Additional files include space group number for each structure and the "heat map" values for the empirical distribution of modified Pettifor scale-encoded (1D periodic table) values. Fingerprints The compositional (Magpie) and structural (CrystalNN) fingerprints* are produced in fingerprint_snapshot.py using mp-time-split data [docs] [repo] [figshare] and are given in comp_fingerprints.csv (132 features) and struct_fingerprints.csv (61 features), respectively. Each has an additional column in the first position, material_id, which contains the Materials Project material_id. So, in total there are 133 and 62 columns, respectively. There are 40476 entries, plus a header row with labels, so 40477 rows in total. The primary purpose of these datasets is to avoid repeating lengthy calculations each time a matbench-genmetrics benchmark is computed; thus, only the generated structures need to be featurized. The total runtime for the compositional and structural fingerprinting using 6 physical cores (12 virtual cores as determined by multiprocessing.cpu_count()) is approximately 50 minutes and 140 min, respectively. The benchmarks can be used with materials generative models such as xtal2png+Imagen. *The use of Magpie and CrystalNN featurizers are based on the coverage metric from CDVAE [repo] [paper]. A small set of dummy data for testing purposes is also included (dummy_comp_fingerprints.csv and dummy_struct_fingerprints.csv) Space Group Number The first column of space_group_number.csv is material_id, same as above, and the second column is space_group_number, as determined by the get_space_group_info() method for each pymatgen Structure object. A corresponding dummy file is provided. For the generation of this dataset, see validity_snapshot.py. Modified Pettifor Scale Each of the pymatgen Composition objects are converted to the fractional_composition counterpart and then collectively summed via np.sum() to get the fractional prevalences of periodic elements across each of the datasets. The periodic elements are then encoded in the 1D periodic table called the modified Pettifor scale. The columns of mod_petti_contributions.csv are symbol as in element symbol, mod_petti as in the modified Pettifor scale value, and contribution as in the fractional contribution to the full dataset. A dummy file is also provided for testing and debugging purposes. For the generation of this dataset, see validity_snapshot.py. The mapping dictionary is a slightly modified version of ElMD's implementation.
创建时间:
2022-08-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作