WyFormer generated structures
收藏DataCite Commons2025-06-01 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/WyFormer_generated_structures/29094701/2
下载链接
链接失效反馈官方服务:
资源简介:
WyFormer generated datasetsStructures generated by WyFormer, with various post processing. Used in the ICML 2025 paper "Wyckoff Transformer: Generation of Symmetric Crystals".<br>The folder structure is the following: the first is the dataset which was used for training WyFormer, using only train and validation parts. Then the folder structure corresponds to transformations of the data.<i>mp_20/WyckoffTransformer</i> 10k formally valid Wyckoff representations generated by WyFormer trained on MP-20 dataset.<i>mp_20/WyckoffTransformer/DiffCSP++10k</i> 9999 structures obtained with DifCSP++; it failed for one Wyckoff representation, we consider this structure unstable. <b>Can be considered as the "official" WyFormer sample.</b><i>mp_20/WyckoffTransformer/DiffCSP++10k/CHGNet_free/DFT</i> CHGNet pre-relaxation followed by DFT relaxation; for some structures the DFT relaxation failed, we consider them unstable. The relaxation was obtained using MP-compatiable <code>MPGGADoubleRelaxStaticMaker</code>. Note that material indices unfortunately got permuted at the CHGNet pre-relaxations step. Used in Table 1. <b>Can be considered as the "official" WyFormer DFT-relaxed sample.</b><i>mp_20/WyckoffTransformer/DiffCSP++10k/CHGNet_free/DFT-GGA-relax-1</i> same as above, but relaxed with a single invocation of <code>MPRelaxSet</code>. This is less precise, not strictly compatible to Materials Project, but is the same as reported in FlowMM paper and code. Used in Table 1.<i>mp_20/WyckoffTransformer/DiffCSP++/</i> 1k structures obtained with DifCSP++<i>mp_20/WyckoffTransformer/DiffCSP++/DFT/</i> DFT relaxation of 105 <i>novel and unique</i> structures, <code>MPGGADoubleRelaxStaticMaker</code><i>mp_20/WyckoffTransformer/CrySPR/CHGNet_fix/</i> 1k structures obtained with CrySPR and CHGNet, <i>whith a constraint during the relaxation that maintained the Wyckoff positions</i><i>mp_20/WyckoffTransformer/CrySPR/CHGNet_fix/DFT/</i> DFT relaxation of 105 <i>novel and unique</i> structures, <code>MPGGADoubleRelaxStaticMaker.</code><i>mpts_52/WyckoffTransformer/CrySPR/CHGNet_fix</i> 1k structures generated with Wyformer trained on MPTS-52 dataset, then CrySPR and CHGNet, <i>with a constraint during the relaxation that maintained the Wyckoff positions</i>.Format description<code>structure</code> - <code>pymatgen.core.structure.Structure</code><code>group</code>, <code>species</code>, <code>numIons,</code> <code>sites</code> - arguments to <code>pyxtal.from_random</code>. For <code>*/WyckoffTransformer/data.csv.gz</code> they were generated with WyFormer, for the rest they were obtained from structures with <code>pyxtal.from_seed</code>. Note the the indexing within those fields is by chemical element, not by Wyckoff position.<code>site_symmetries</code>, <code>elements</code>, <code>multiplicity</code>, <code>wyckoff_letters</code>, <code>sites_enumeration</code>, <code>dof</code> - information about the Wyckoff positions, indexed by Wyckoff position. The <code>dof</code> is the number of degrees of freedom for the Wyckoff position, i.e. the number of free parameters in the Wyckoff position. <code>sites_enumeration</code> enumerates the Wyckoff position with the same site symmetry, see the paper for details. For example, for space group <code>2</code> aka <code>P-1</code>, Wyckoff position <code>a</code> has site symmetry <code>-1</code> and enumeration <code>0</code>, while <code>b</code> has site symmetry <code>-1</code> and enumeration <code>1</code>.<code>sites_enumeration_augmented</code> - possible variants of the enumeration, depend on the arbitrary choice of the space group Euclidean normalizer, e. g. unit cell center. See the preprint for details.<code>smact_validity</code> - "Compositional Validity" computed with SMACT. Not all structures in MP-20 conform to this criterion.<code>structural_validity</code> - "Structural Validity" introduced by CDVAE, whether any two atoms are closer than 0.5 Angstroms<code>cdvae_e</code> - energy predicted by the model included in CDVAE, used for EMD(E) distribution similarity metric<code>chgnet_energy_per_atom</code> - energy per atom from CHGNet relaxation<code>chgnet_e_above_hull_corrected</code> - energy above hull from CHGNet relaxation, taking into account MP energy correction<code>dft_e_uncorrected</code> - raw potential energy from DFT relaxation<code>dft_e_corrected</code> - potential energy from DFT relaxation, corrected with <code>MaterialsProject2020Compatibility</code><code>dft_e_above_hull_corrected</code> - energy above hull computed from DFT relaxation computed using <code>2023-02-07-ppd-mp.pkl.gz</code> distributed by matbench-discovery as reference.<code>entry</code> - <code>pymatgen.entries.ComputedEntry</code> containing the results of the DFT run.
提供机构:
figshare
创建时间:
2025-05-21



