five

vector-institute/s2ef-15m

收藏
Hugging Face2024-06-26 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/vector-institute/s2ef-15m
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含从多个来源收集的3D原子数据集,带有力和能量标签。数据集的结构包括每个实例的原子编号(input_ids)、3D坐标(coords)、每个原子的力(forces)、系统的总能量和形成能量(total_energy/formation_energy)以及一个布尔值(has_formation_energy)表示数据集是否具有有效的形成能量。数据集还通过上采样确保了数据分布的平衡。

This dataset contains a collection of 3D atomistic datasets with force and energy labels gathered from a series of sources. The dataset structure includes atomic numbers (input_ids), 3-D coordinates (coords), forces per atom (forces), total and formation energy per system (total_energy/formation_energy), and a boolean (has_formation_energy) indicating whether the dataset has a valid formation energy. The datasets are balanced through upsampling to ensure a balanced distribution.
提供机构:
vector-institute
原始信息汇总

数据集概述

数据集描述

该数据集包含一系列3D原子数据集,带有力与能量标签,来源于以下几个来源:

  • Open Catalyst Project (OC20, OC22, ODAC23)
  • Materials Project Trajectory Dataset (MPtrj)
  • SPICE 1.1.4

数据结构

数据实例

每个实例包含以下特征:

  • input_ids: 原子编号序列,类型为int16
  • coords: 3D坐标序列,类型为float32
  • forces: 每个原子的力序列,类型为float32
  • formation_energy: 形成能,类型为float32
  • total_energy: 总能量,类型为float32
  • has_formation_energy: 是否具有有效的形成能,类型为bool

示例数据: json { "input_ids": [26, 28, 28, 28], "coords": [[0.0, 0.0, 0.0], [0.0, 0.0, 3.5395920276641846], [0.0, 1.7669789791107178, 1.7697960138320923], [1.7669789791107178, 0.0, 1.7697960138320923]], "forces": [[-1.999999987845058e-08, 2.999999892949745e-08, -0.0], [-5.99999978589949e-08, 5.99999978589949e-08, 9.99999993922529e-09], [-0.0014535699738189578, 0.0014535400550812483, 9.99999993922529e-09], [0.001453649951145053, -0.0014536300441250205, -2.999999892949745e-08]], "formation_energy": 0.6030612587928772, "total_energy": -25.20570182800293, "has_formation_energy": true }

数据集划分

  • train: 包含15,000,000个样本,数据大小为43,353,603,080字节。

数据集大小

  • 下载大小: 44,763,791,790字节
  • 数据集大小: 43,353,603,080字节

引用信息

plaintext @article{ocp_dataset, author = {Chanussot*, Lowik and Das*, Abhishek and Goyal*, Siddharth and Lavril*, Thibaut and Shuaibi*, Muhammed and Riviere, Morgane and Tran, Kevin and Heras-Domingo, Javier and Ho, Caleb and Hu, Weihua and Palizhati, Aini and Sriram, Anuroop and Wood, Brandon and Yoon, Junwoong and Parikh, Devi and Zitnick, C. Lawrence and Ulissi, Zachary}, title = {Open Catalyst 2020 (OC20) Dataset and Community Challenges}, journal = {ACS Catalysis}, year = {2021}, doi = {10.1021/acscatal.0c04525}, }

@article{oc22_dataset, author = {Tran*, Richard and Lan*, Janice and Shuaibi*, Muhammed and Wood*, Brandon and Goyal*, Siddharth and Das, Abhishek and Heras-Domingo, Javier and Kolluru, Adeesh and Rizvi, Ammar and Shoghi, Nima and Sriram, Anuroop and Ulissi, Zachary and Zitnick, C. Lawrence}, title = {The Open Catalyst 2022 (OC22) dataset and challenges for oxide electrocatalysts}, journal = {ACS Catalysis}, year={2023}, }

@article{odac23_dataset, author = {Anuroop Sriram and Sihoon Choi and Xiaohan Yu and Logan M. Brabson and Abhishek Das and Zachary Ulissi and Matt Uyttendaele and Andrew J. Medford and David S. Sholl}, title = {The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air Capture}, year = {2023}, journal={arXiv preprint arXiv:2311.00341}, }

@article{deng_2023_chgnet, author={Deng, Bowen and Zhong, Peichen and Jun, KyuJung and Riebesell, Janosh and Han, Kevin and Bartel, Christopher J. and Ceder, Gerbrand}, title={CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling}, journal={Nature Machine Intelligence}, year={2023}, DOI={10.1038/s42256-023-00716-3}, pages={1–11} }

@article{eastman2023spice, title={Spice, a dataset of drug-like molecules and peptides for training machine learning potentials}, author={Eastman, Peter and Behara, Pavan Kumar and Dotson, David L and Galvelis, Raimondas and Herr, John E and Horton, Josh T and Mao, Yuezhi and Chodera, John D and Pritchard, Benjamin P and Wang, Yuanqing and others}, journal={Scientific Data}, volume={10}, number={1}, pages={11}, year={2023}, publisher={Nature Publishing Group UK London} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作