five

Dataset for Machine-learning structural reconstructions for accelerated point defect calculations

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10579526
下载链接
链接失效反馈
官方服务:
资源简介:
Datasets generated for the publication "Machine-learning structural reconstructions for accelerated point defect calculations". It contains the following compressed files:Chalcogenide dataset: bulk_primitive_folders.tar.gz: DFT relaxation trajectories and final relaxed structures for the bulk (pristine) primitive cells of ~50 chalcogenide hosts (e.g. vasprun.xml and CONTCAR files). The data is organised into directories according to the composition formula (e.g. BiSCl, BiSBr). defect_relaxations.tar.gz: DFT relaxation trajectories for ~130 neutral cation vacancies in ~50 chalcogenide hosts (e.g. vasprun.xml's, POSCAR and CONTCAR files). The data is organised as follows: first, by training, validation and test splits (into the folders 01_Training, 02_Validation, 03_Test, 03b_Zincblende, 03c_Iodides). Each of these folders is then organised by host compositions -> defects -> distortions -> VASP files, e.g.: 01_Training/├── BiSCl/│   └── v_Bi_s0_0/│       ├── Unperturbed/│       │   ├── POSCAR│       │   ├── CONTCAR│       │   └── vasprun.xml│       └── Bond_Distortion_60.0%/│           ├── POSCAR│           ├── CONTCAR│           └── vasprun.xml└── Ca2SiS4/    └── v_Ca_s0_0/        └── Unperturbed/            ├── vasprun.xml            ├── vasprun.xml_22_20_34on16_03_23.gz            ├── POSCAR            └── CONTCAR We have 14 or 15 distortions per defect (i.e. 14 different sampling structures that we then relax). These are named based on the distortion factor that we apply to the defect's nearest neighbours (e.g. Bond_Distortion_60.0% corresponds to increasing the distance between a vacancy and its nearest neighbours by a factor of 1.6). Unperturbed denotes no distortion (i.e. relaxing the ideal, high symmetry defect structure). Bond_Distortion_0.0% denotes just rattling (random displacements to all atoms in the supercell). Each defect directory should contain a yaml file (e.g. v_Ca_s0_0/v_Ca_s0_0.yaml) which maps each distortion to the final energy of the relaxed structure. This file was used to check if a distortion led to a more favourable configuration than the ideal, high symmetry relaxation (termed Unperturbed).  Also, note that for some distortions, several relaxations were required. In these cases, the oldest vasprun.xml files are named with the date (e.g. vasprun.xml_22_20_34on16_03_23.gz, vasprun.xml, where vasprun.xml is the last relaxation/most recent file). Finally, note that a small number of ionic steps are not electronically converged, so these shouldn't be included for training/validation (e.g. NELM reached for those steps). We also include here the python script parsing_functions.py that we used to parse the vasprun.xml files in case it's useful (parses data into a dictionary organised by host composition -> defect -> distortion, orders the Vasprun.xml files by date and removes electronically unconverged ionic steps). Finally, the file vac_chalcogenides_trained_model_and_datasets.tar.gz contains:  trained M3GNet model for the neutral cation vacancies in the chalcogenide hosts pickle files with the processed training and validation sets (by processing the VASP output files in defect_relaxations.tar.gz). The pickle files are formatted as dictionaries with keys structures (mapping to a list of pymatgen Structure objects), energies (mapping to a list with the respective energies in units of eV), forces (list of forces in units of eV/A), stresses (list of stresses in units of GPa)).
创建时间:
2024-02-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作