five

Materials Project Trjectory (MPtrj) Dataset

收藏
DataCite Commons2025-06-01 更新2024-08-18 收录
下载链接:
https://figshare.com/articles/dataset/Materials_Project_Trjectory_MPtrj_Dataset/23713842/1
下载链接
链接失效反馈
官方服务:
资源简介:
This data file is the MPtrj dataset.  The json file contains 1,580,395 structures, 1,580,395 energies, 7,944,833 magnetic moments, 49,295,660 forces, and 14,223,555 stresses that were used to train the pretrained CHGNet The structures and labels are parsed from all the GGA/GGA+U static/relaxation trajectories from 2022.9 version Materials Project, with selection method that avoids imcompatible calculations and duplicated structures. The format of the json file looks like this: MPtrj        -'mp-id-0'               -'frame-id-0'                      -'structure': dictionary of pymatgen.core.Structure                      -'uncorrected_total_energy': [eV] raw energy from VASP output                      -'corrected_total_energy': [eV] VASP total energy after MP2020 compatibility                      -'energy_per_atom': [eV/atom] corrected energy per atom, this is the energy label used to train CHGNet                      -'ef_per_atom': [eV/atom] formation energy per atom                      -'e_per_atom_relaxed': [eV/atom] corrected energy per atom of the relaxed structure, this is the energy you can find for the mp-id on materials project website                      -'ef_per_atom_relaxed': [eV/atom] formation energy per atom of the relaxed structure                      -'force': [eV/A] force on the atoms                       -'stress': [kBar] stress on the cell                       -'magmom': [muB] magmom on the atoms                       -'bandgap': [eV] bandgap                -'frame-id-1'                      ...        -'mp-id-1'        ... Notes: 1. The frame id has syntax: 'task_id-calc_id-ionic_step', where 'calc_id' is 0 (second) or 1 (first) in the double relaxation process for each material project relaxation task.  2. Since MPtrj is a diverse dataset that contains both GGA and GGA+U calculation, which has different energy values, MP2020 compatibility is applied to the VASP raw energies to make GGA and GGA+U universally compatible. The 'energy_per_atom' (which is after MP2020 correction) is used for pretrained CHGNet training. see: https://pymatgen.org/pymatgen.entries.html#pymatgen.entries.compatibility.Compatibility 3. CHGNet is trained on absolute value of DFT magmom, the unit conversion is automatic if you use the dataset we provide, see: https://github.com/CederGroupHub/chgnet/blob/main/chgnet/data/dataset.py 4. CHGNet output stress is in unit of GPa, which is -0.1 *  VASP raw stress in MPtrj dataset. The unit conversion is also implemented in CHGNet dataset, so you don't have to convert the VASP stress unit when passing them to the dataset object. Reference: If you use CHGNet or MPtrj dataset, please cite: @article{deng_2023_chgnet, title={{CHGNet: Pretrained universal neural network potential for charge-informed atomistic modeling}}, author={Deng, Bowen and Zhong, Peichen and Jun, KyuJung and Riebesell, Janosh and Han, Kevin and Bartel, Christopher J and Ceder, Gerbrand}, journal={arXiv preprint arXiv:2302.14231}, year={2023}, url = {https://arxiv.org/abs/2302.14231} }

本数据集文件为MPtrj数据集。 该JSON文件包含1,580,395个晶体结构、1,580,395份能量数据、7,944,833份磁矩数据、49,295,660份原子受力数据以及14,223,555份晶胞应力数据,用于预训练CHGNet模型。 上述晶体结构与标签均从2022.9版材料项目数据库(Materials Project)的所有GGA/GGA+U静态/弛豫轨迹中解析得到,解析过程中采用了筛选机制以排除不兼容的计算结果与重复结构。 该JSON文件的格式如下: MPtrj ├── 'mp-id-0' │ ├── 'frame-id-0' │ │ ├── 'structure':pymatgen.core.Structure类型的字典 │ │ ├── 'uncorrected_total_energy':[eV] VASP输出的原始总能量 │ │ ├── 'corrected_total_energy':[eV] 经MP2020兼容性校正后的VASP总能量 │ │ ├── 'energy_per_atom':[eV/atom] 校正后的单原子能量,为训练CHGNet所使用的能量标签 │ │ ├── 'ef_per_atom':[eV/atom] 单原子形成能 │ │ ├── 'e_per_atom_relaxed':[eV/atom] 弛豫后结构的校正单原子能量,即材料项目数据库网站上对应mp-id的能量数值 │ │ ├── 'ef_per_atom_relaxed':[eV/atom] 弛豫后结构的单原子形成能 │ │ ├── 'force':[eV/Å] 原子所受受力 │ │ ├── 'stress':[kBar] 晶胞所受应力 │ │ ├── 'magmom':[μB] 原子磁矩 │ │ ├── 'bandgap':[eV] 带隙 │ │ └── 'frame-id-1'及后续帧格式与上述一致 ├── 'mp-id-1' └── 其余mp-id条目格式与上述一致 注意事项: 1. 帧ID的命名规则为`task_id-calc_id-ionic_step`,其中针对每个材料项目数据库弛豫任务的双弛豫过程,`calc_id`取值为0(对应第二次弛豫)或1(对应第一次弛豫)。 2. 由于MPtrj数据集同时包含GGA与GGA+U两类计算数据,二者的能量数值体系存在差异,因此采用MP2020兼容性校正对VASP原始能量进行统一处理,以实现GGA与GGA+U数据的通用兼容。本次CHGNet预训练所使用的能量标签为经过MP2020校正后的`energy_per_atom`。 参考链接:https://pymatgen.org/pymatgen.entries.html#pymatgen.entries.compatibility.Compatibility 3. CHGNet基于DFT磁矩的绝对值进行训练,若使用本数据集,单位转换将自动完成,详情请见:https://github.com/CederGroupHub/chgnet/blob/main/chgnet/data/dataset.py 4. CHGNet输出的应力单位为吉帕斯卡(GPa),其数值为MPtrj数据集中VASP原始应力(单位为kBar)的-0.1倍。CHGNet数据集工具已内置该单位转换逻辑,因此将数据传入数据集对象时无需手动进行单位转换。 引用说明: 若您使用CHGNet或MPtrj数据集,请引用以下文献: @article{deng_2023_chgnet, title={{CHGNet: Pretrained universal neural network potential for charge-informed atomistic modeling}}, author={Deng, Bowen and Zhong, Peichen and Jun, KyuJung and Riebesell, Janosh and Han, Kevin and Bartel, Christopher J and Ceder, Gerbrand}, journal={arXiv preprint arXiv:2302.14231}, year={2023}, url = {https://arxiv.org/abs/2302.14231} }
提供机构:
figshare
创建时间:
2023-07-20
搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作