Materialyze/matpes
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Materialyze/matpes
下载链接
链接失效反馈官方服务:
资源简介:
---
license: bsd-3-clause
task_categories:
- graph-ml
language:
- en
tags:
- chemistry
- materials
size_categories:
- 100K<n<1M
dataset_info:
config_name: default
features:
# ---- Composition / chemistry ----
- name: nsites
dtype: int32
- name: elements
sequence: string
- name: nelements
dtype: int32
- name: composition
sequence:
- name: element
dtype: string
- name: amount
dtype: float64
- name: composition_reduced
sequence:
- name: element
dtype: string
- name: amount
dtype: float64
- name: formula_pretty
dtype: string
- name: formula_anonymous
dtype: string
- name: chemsys
dtype: string
# ---- Cell-level scalars ----
- name: volume
dtype: float64
- name: density
dtype: float64
- name: density_atomic
dtype: float64
# ---- Symmetry ----
- name: symmetry
struct:
- name: crystal_system
dtype: string
- name: symbol
dtype: string
- name: number
dtype: int32
- name: point_group
dtype: string
- name: symprec
dtype: float64
- name: angle_tolerance
dtype: float64
- name: version
dtype: string
# ---- Pymatgen Structure ----
- name: structure
struct:
- name: '@module'
dtype: string
- name: '@class'
dtype: string
- name: charge
dtype: float64
- name: lattice
struct:
- name: matrix
sequence:
sequence: float64
- name: pbc
sequence: bool
- name: a
dtype: float64
- name: b
dtype: float64
- name: c
dtype: float64
- name: alpha
dtype: float64
- name: beta
dtype: float64
- name: gamma
dtype: float64
- name: volume
dtype: float64
- name: properties
dtype: string
- name: sites
sequence:
- name: species
sequence:
- name: element
dtype: string
- name: occu
dtype: float64
- name: abc
sequence: float64
- name: properties
struct:
- name: magmom
dtype: float64
- name: label
dtype: string
- name: xyz
sequence: float64
# ---- Labels (DFT targets) ----
- name: energy
dtype: float64
- name: forces
sequence:
sequence: float64
- name: stress
sequence: float64
# ---- Identifiers / derived properties ----
- name: matpes_id
dtype: string
- name: bandgap
dtype: float64
- name: functional
dtype: string
- name: formation_energy_per_atom
dtype: float64
- name: cohesive_energy_per_atom
dtype: float64
- name: abs_forces
sequence: float64
- name: bader_charges
sequence: float64
- name: bader_magmoms
sequence: float64
# ---- Provenance (MD sampling + MP origin) ----
- name: provenance
struct:
- name: original_mp_id
dtype: string
- name: materials_project_version
dtype: string
- name: md_ensemble
dtype: string
- name: md_temperature
dtype: float64
- name: md_pressure
dtype: float64
- name: md_step
dtype: int32
- name: mlip_name
dtype: string
configs:
- config_name: pbe
data_files:
- split: train
path: MatPES-PBE-2025.2-charges.json
- config_name: r2scan
data_files:
- split: train
path: MatPES-R2SCAN-2025.2-charges.json
- config_name: pbe-2025.2
data_files: MatPES-PBE-2025.2-charges.json
- config_name: r2scan-2025.2
data_files: MatPES-R2SCAN-2025.2-charges.json
- config_name: pbe-2025.1
data_files: MatPES-PBE-2025.1-charges.json
- config_name: r2scan-2025.1
data_files: MatPES-R2SCAN-2025.1-charges.json
- config_name: pbe-atoms
data_files: MatPES-PBE-atoms.json
- config_name: r2scan-atoms
data_files: MatPES-R2SCAN-atoms.json
papers:
- 2503.04070
---
## Dataset Description
- **Homepage:** [matpes.ai](http://matpes.ai)
- **Paper:** [A Foundational Potential Energy Surface Dataset for Materials](https://doi.org/10.48550/arXiv.2503.04070)
- **Leaderboard:** [MatCalc-Benchmark](http://matpes.ai/benchmarks)
- **Point of Contact:** [Materialyze]
### Dataset Summary
Potential energy surface datasets with near-complete coverage of the periodic table are used to train foundation
potentials (FPs), i.e., machine learning interatomic potentials (MLIPs) with near-complete coverage of the periodic
table. MatPES is an initiative by the [Materialyze] Lab and the [Materials Project] to address
[critical deficiencies](http://matpes.ai/about) in such PES datasets for materials.
1. **Accuracy.** MatPES is computed using static DFT calculations with stringent converegence criteria.
Please refer to the `MatPESStaticSet` in [pymatgen] for details.
2. **Comprehensiveness.** MatPES structures are sampled using a 2-stage version of DImensionality-Reduced
Encoded Clusters with sTratified [DIRECT](https//doi.org/10.1038/s41524-024-01227-4) sampling from a greatly expanded configuration of MD structures.
3. **Quality.** MatPES includes computed data from the PBE functional, as well as the high fidelity r2SCAN meta-GGA
functional with improved description across diverse bonding and chemistries.
The initial v2025.1 release comprises ~400,000 structures from 300K MD simulations. This dataset is much smaller
than other PES datasets in the literature and yet achieves comparable or, in some cases,
[improved performance and reliability](http://matpes.ai/benchmarks) on trained FPs.
MatPES is part of the MatML ecosystem, which includes the [MatGL] (Materials Graph Library) and [maml] (MAterials
Machine Learning) packages, the [MatPES] (Materials Potential Energy Surface) dataset, and the [MatCalc] (Materials
Calculator).
[Materialyze]: http://materialyze.ai
[Materials Project]: https://materialsproject.org
[M3GNet]: http://dx.doi.org/10.1038/s43588-022-00349-3
[CHGNet]: http://doi.org/10.1038/s42256-023-00716-3
[TensorNet]: https://arxiv.org/abs/2306.06482
[maml]: https://materialsvirtuallab.github.io/maml/
[MatGL]: https://matgl.ai
[MatPES]: https://matpes.ai
[MatCalc]: https://matcalc.ai
提供机构:
Materialyze



