Materials Project Trajectory (MPtrj) Dataset
收藏DataCite Commons2024-02-01 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/Materials_Project_Trjectory_MPtrj_Dataset/23713842
下载链接
链接失效反馈官方服务:
资源简介:
<br>This data file is the MPtrj dataset.The json file contains 1,580,395 structures, 1,580,395 energies, 7,944,833 magnetic moments, 49,295,660 forces, and 14,223,555 stresses that were used to train the pretrained CHGNetThe structures and labels are parsed from all the GGA/GGA+U static/relaxation trajectories from 2022.9 version Materials Project, with selection method that avoids imcompatible calculations and duplicated structures.The format of the json file looks like this:MPtrj-'mp-id-0'-'frame-id-0'-'structure': dictionary of pymatgen.core.Structure-'uncorrected_total_energy': [eV] raw energy from VASP output-'corrected_total_energy': [eV] VASP total energy after MP2020 compatibility-'energy_per_atom': [eV/atom] corrected energy per atom, this is the energy label used to train CHGNet-'ef_per_atom': [eV/atom] formation energy per atom-'e_per_atom_relaxed': [eV/atom] corrected energy per atom of the relaxed structure, this is the energy you can find for the mp-id on materials project website-'ef_per_atom_relaxed': [eV/atom] formation energy per atom of the relaxed structure-'force': [eV/A] force on the atoms-'stress': [kBar] stress on the cell-'magmom': [muB] magmom on the atoms-'bandgap': [eV] bandgap-'frame-id-1'...-'mp-id-1'...Notes:1. The frame id has syntax: 'task_id-calc_id-ionic_step', where 'calc_id' is 0 (second) or 1 (first) in the double relaxation process for each material project relaxation task.2. Since MPtrj is a diverse dataset that contains both GGA and GGA+U calculation, which has different energy values, MP2020 compatibility is applied to the VASP raw energies to make GGA and GGA+U universally compatible. The 'energy_per_atom' (which is after MP2020 correction) is used for pretrained CHGNet training.see: https://pymatgen.org/pymatgen.entries.html#pymatgen.entries.compatibility.Compatibility3. There're missing MAGMOMs labels in the MPtrj, which we put None as labels. These do not mean the MAGMOM label is 0. CHGNet is trained on absolute value of DFT magmom, which is the absolute value of the labels contained in MPtrj, the unit conversion is automatic if you use the dataset we provide, see: https://github.com/CederGroupHub/chgnet/blob/main/chgnet/data/dataset.py4. The stress values in MPtrj json are raw stress values from VASP. CHGNet output stress is in unit of GPa, which is -0.1 * the VASP raw stress in MPtrj dataset. The unit conversion is also implemented in CHGNet dataset, so you don't have to convert the VASP stress unit when passing them to the dataset object.Reference:If you use CHGNet or MPtrj dataset, please cite:@article{deng_2023_chgnet,title={CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling},DOI={10.1038/s42256-023-00716-3},journal={Nature Machine Intelligence},author={Deng, Bowen and Zhong, Peichen and Jun, KyuJung and Riebesell, Janosh and Han, Kevin and Bartel, Christopher J. and Ceder, Gerbrand},year={2023},pages={1–11}}
本数据文件为MPtrj数据集。该JSON文件包含1,580,395个结构、1,580,395份能量数据、7,944,833个磁矩数据、49,295,660组原子受力数据以及14,223,555组晶胞应力数据,用于预训练CHGNet模型。
该数据集的结构与标签均从2022.9版本的Materials Project(材料项目数据库)的所有GGA(广义梯度近似)/GGA+U(加在位库仑修正的广义梯度近似)静态/弛豫轨迹中解析得到,并采用筛选方法排除了不兼容计算与重复结构。
该JSON文件的格式示例如下:
MPtrj-'mp-id-0'-'frame-id-0'-'structure': pymatgen.core.Structure(pymatgen核心结构对象)字典
-'uncorrected_total_energy': [eV] VASP(维也纳从头算模拟包)输出的原始总能量
-'corrected_total_energy': [eV] 经MP2020兼容性校正后的VASP总能量
-'energy_per_atom': [eV/atom] 校正后的单原子能量,为训练CHGNet所用的能量标签
-'ef_per_atom': [eV/atom] 单原子形成能
-'e_per_atom_relaxed': [eV/atom] 弛豫结构的校正后单原子能量,即Materials Project网站上对应mp-id的单原子能量
-'ef_per_atom_relaxed': [eV/atom] 弛豫结构的单原子形成能
-'force': [eV/Å] 原子所受受力
-'stress': [kBar] 晶胞所受应力
-'magmom': [μB] 原子磁矩
-'bandgap': [eV] 带隙
-'frame-id-1'...
-'mp-id-1'...
### 备注
1. 帧ID的语法格式为`task_id-calc_id-ionic_step`,其中针对每个Materials Project弛豫任务的双弛豫过程,`calc_id`取值为0(第二次弛豫)或1(第一次弛豫)。
2. 由于MPtrj数据集同时包含GGA与GGA+U两种计算类型,二者能量数值存在差异,因此对VASP原始能量应用了MP2020兼容性校正,以实现GGA与GGA+U计算的通用兼容。用于预训练CHGNet的能量标签为经MP2020校正后的`energy_per_atom`。
参考链接:https://pymatgen.org/pymatgen.entries.html#pymatgen.entries.compatibility.Compatibility
3. MPtrj数据集中存在部分缺失的MAGMOM标签,我们将其标注为None,但这并不代表磁矩数值为0。CHGNet基于DFT(密度泛函理论)磁矩的绝对值进行训练,即使用MPtrj数据集中标签的绝对值。若使用本数据集,单位转换将自动完成,详见:https://github.com/CederGroupHub/chgnet/blob/main/chgnet/data/dataset.py
4. MPtrj JSON文件中的应力值为VASP输出的原始应力值。CHGNet输出的应力单位为GPa,其数值等于MPtrj数据集中VASP原始应力值乘以-0.1。该单位转换已在CHGNet数据集工具中实现,因此将数据传入数据集对象时无需手动进行应力单位转换。
### 引用说明
若使用CHGNet或MPtrj数据集,请引用以下文献:
bibtex
@article{deng_2023_chgnet,
title={CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling},
DOI={10.1038/s42256-023-00716-3},
journal={Nature Machine Intelligence},
author={Deng, Bowen and Zhong, Peichen and Jun, KyuJung and Riebesell, Janosh and Han, Kevin and Bartel, Christopher J. and Ceder, Gerbrand},
year={2023},
pages={1–11}}
提供机构:
figshare
创建时间:
2023-07-20
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



