Materials Project Trajectory (MPtrj) Dataset
收藏figshare.com2024-02-01 更新2025-01-22 收录
下载链接:
https://figshare.com/articles/dataset/Materials_Project_Trjectory_MPtrj_Dataset/23713842/2
下载链接
链接失效反馈官方服务:
资源简介:
This data file is the MPtrj dataset.The json file contains 1,580,395 structures, 1,580,395 energies, 7,944,833 magnetic moments, 49,295,660 forces, and 14,223,555 stresses that were used to train the pretrained CHGNetThe structures and labels are parsed from all the GGA/GGA+U static/relaxation trajectories from 2022.9 version Materials Project, with selection method that avoids imcompatible calculations and duplicated structures.The format of the json file looks like this:MPtrj-'mp-id-0'-'frame-id-0'-'structure': dictionary of pymatgen.core.Structure-'uncorrected_total_energy': [eV] raw energy from VASP output-'corrected_total_energy': [eV] VASP total energy after MP2020 compatibility-'energy_per_atom': [eV/atom] corrected energy per atom, this is the energy label used to train CHGNet-'ef_per_atom': [eV/atom] formation energy per atom-'e_per_atom_relaxed': [eV/atom] corrected energy per atom of the relaxed structure, this is the energy you can find for the mp-id on materials project website-'ef_per_atom_relaxed': [eV/atom] formation energy per atom of the relaxed structure-'force': [eV/A] force on the atoms-'stress': [kBar] stress on the cell-'magmom': [muB] magmom on the atoms-'bandgap': [eV] bandgap-'frame-id-1'...-'mp-id-1'...Notes:1. The frame id has syntax: 'task_id-calc_id-ionic_step', where 'calc_id' is 0 (second) or 1 (first) in the double relaxation process for each material project relaxation task.2. Since MPtrj is a diverse dataset that contains both GGA and GGA+U calculation, which has different energy values, MP2020 compatibility is applied to the VASP raw energies to make GGA and GGA+U universally compatible. The 'energy_per_atom' (which is after MP2020 correction) is used for pretrained CHGNet training.see: https://pymatgen.org/pymatgen.entries.html#pymatgen.entries.compatibility.Compatibility3. There're missing MAGMOMs labels in the MPtrj, which we put None as labels. These do not mean the MAGMOM label is 0. CHGNet is trained on absolute value of DFT magmom, which is the absolute value of the labels contained in MPtrj, the unit conversion is automatic if you use the dataset we provide, see: https://github.com/CederGroupHub/chgnet/blob/main/chgnet/data/dataset.py4. The stress values in MPtrj json are raw stress values from VASP. CHGNet output stress is in unit of GPa, which is -0.1 * the VASP raw stress in MPtrj dataset. The unit conversion is also implemented in CHGNet dataset, so you don't have to convert the VASP stress unit when passing them to the dataset object.Reference:If you use CHGNet or MPtrj dataset, please cite:@article{deng_2023_chgnet,title={CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling},DOI={10.1038/s42256-023-00716-3},journal={Nature Machine Intelligence},author={Deng, Bowen and Zhong, Peichen and Jun, KyuJung and Riebesell, Janosh and Han, Kevin and Bartel, Christopher J. and Ceder, Gerbrand},year={2023},pages={1–11}}
本数据文件为MPtrj数据集。其中包含1,580,395个结构、1,580,395个能量、7,944,833个磁矩、49,295,660个力以及14,223,555个应力,这些数据均用于训练预训练的CHGNet。所述结构和标签均来自2022年9月版材料项目的所有GGA/GGA+U静态/弛豫轨迹,所选方法旨在避免不兼容的计算和重复的结构。JSON文件的格式如下:MPtrj-'mp-id-0'-'frame-id-0'-'structure':pymatgen.core.Structure字典-'uncorrected_total_energy':来自VASP输出的原始能量(单位:电子伏特)-'corrected_total_energy':经过MP2020兼容性校正后的VASP总能量(单位:电子伏特)-'energy_per_atom':校正后的每原子能量(单位:电子伏特/原子),这是用于训练CHGNet的能量标签-'ef_per_atom':每原子形成能(单位:电子伏特/原子)-'e_per_atom_relaxed':弛豫结构的校正后每原子能量(单位:电子伏特/原子),这是您在材料项目网站上可以找到的mp-id的能量-'ef_per_atom_relaxed':弛豫结构的每原子形成能(单位:电子伏特/原子)-'force':对原子的力(单位:电子伏特/埃)-'stress':单元格上的应力(单位:千巴)-'magmom':原子的磁矩(单位:玻尔磁子)-'bandgap':带隙(单位:电子伏特)-'frame-id-1'...-'mp-id-1'...备注:1. 框架ID的语法为:'task_id-calc_id-ionic_step',其中'calc_id'在每种材料项目的弛豫任务的双弛豫过程中为0(第二次)或1(第一次)。2. 由于MPtrj是一个包含GGA和GGA+U计算(具有不同的能量值)的多样化数据集,因此应用了MP2020兼容性,以使GGA和GGA+U普遍兼容。'energy_per_atom'(经过MP2020校正后)用于预训练CHGNet的训练。参见:https://pymatgen.org/pymatgen.entries.html#pymatgen.entries.compatibility.Compatibility3. MPtrj中存在缺失的MAGMOM标签,我们将它们标记为None。这并不意味着MAGMOM标签为0。CHGNet是在DFT磁矩的绝对值上训练的,即MPtrj中包含的标签的绝对值,如果您使用我们提供的数据集,单位转换将自动进行,参见:https://github.com/CederGroupHub/chgnet/blob/main/chgnet/data/dataset.py4. MPtrj JSON中的应力值是来自VASP的原始应力值。CHGNet输出的应力单位为GPa,它是MPtrj数据集中VASP原始应力的-0.1倍。单位转换也在CHGNet数据集中实现,因此您在将VASP应力传递给数据集对象时无需进行单位转换。参考文献:如果您使用CHGNet或MPtrj数据集,请引用:@article{deng_2023_chgnet,title={CHGNet作为预训练的通用神经网络势,用于电荷信息原子建模},DOI={10.1038/s42256-023-00716-3},journal={自然机器智能},author={Deng, Bowen and Zhong, Peichen and Jun, KyuJung and Riebesell, Janosh and Han, Kevin and Bartel, Christopher J. and Ceder, Gerbrand},year={2023},pages={1–11}}
提供机构:
figshare
搜集汇总
数据集介绍

背景与挑战
背景概述
Materials Project Trajectory (MPtrj) Dataset是一个包含1,580,395个结构、能量、磁矩、力和应力数据的大规模数据集,用于训练预训练的CHGNet模型。数据来源于Materials Project的GGA/GGA+U静态/松弛轨迹,并经过筛选和MP2020兼容性处理,确保数据的一致性和可用性。
以上内容由遇见数据集搜集并总结生成



