JSteffen91/OMOL-1k-MD
收藏Hugging Face2026-02-20 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/JSteffen91/OMOL-1k-MD
下载链接
链接失效反馈官方服务:
资源简介:
---
LICENSE: The OMOL-1k-MD dataset is provided under a CC-BY-4.0 license**
license: cc-by-4.0
language:
- en
tags:
- chemistry
size_categories:
- 10M<n<100M
---
## LICENSE: The OMOL-1k-MD dataset is provided under a CC-BY-4.0 license
# Dataset
This dataset contains ab-initio molecular dynamics (AIMD) trajectories of 1000 arbitrary chosen molecules/clusters of the OMol25 dataset by facebookresearch
([https://huggingface.co/facebook/OMol25](https://huggingface.co/facebook/OMol25)).
All chosen molecules/clusters have 50 or less atoms, and are neutral, closed-shell systems.
For each of the 1000 molecules/clusters, three independent AIMD trajectories have been simulated at 300 K with the Vienna ab-initio simulation package, using the PBE functional.
The trajectories are 10.000 steps (1 fs) long each, and start from different initial points (1000 separate MD steps for equilibration have been cut off, respectively).
Thus, the dataset contains 30.000.000 structures in total, which are provided together with their PBE energies and atomic forces.
# How to use
The dataset is divided into 1000 AseLMDB data set files, which are unified via pre-generated global JSON indices.
Individual molecules and MD frames, or collections of them, can be accessed via the included access_omol_1d_md.py Python script.
### Loading individual frames
In this example, frame 1263 of the second trajectory of molecule No. 124 of the dataset is loaded and printed
(element symbols, atomic coordinates in Angstrom, energy in eV, forces in eV/Angstrom)
from access_omol_1k_md import FastAseLMDBDataset
data = FastAseLMDBDataset("omol_1k_md")
atoms, item = data.get(mol_local=124, traj_no=2, frame_in_traj=1263)
print(item.symbols)
print(item.positions)
print(item.energy)
print(item.forces)
### Looking for molecules with properties
In this example, the indices of all molecules with 42 atoms that contain C and H atoms are printed.
from access_omol_1k_md import FastAseLMDBDataset
data = FastAseLMDBDataset("omol_1k_md")
mol_inds = data.find_molecules(natoms=42, contains=["C", "O"])
print(mol_inds)
提供机构:
JSteffen91



