five

graphs-datasets/MD17-naphthalene

收藏
Hugging Face2023-02-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/graphs-datasets/MD17-naphthalene
下载链接
链接失效反馈
官方服务:
资源简介:
--- licence: unknown task_categories: - graph-ml --- # Dataset Card for naphthalene ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [External Use](#external-use) - [PyGeometric](#pygeometric) - [Dataset Structure](#dataset-structure) - [Data Properties](#data-properties) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Additional Information](#additional-information) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **[Homepage](http://www.sgdml.org/#datasets)** - **Paper:**: (see citation) ### Dataset Summary The `naphthalene` dataset is a molecular dynamics (MD) dataset. The total energy and force labels for each dataset were computed using the PBE+vdW-TS electronic structure method. All geometries are in Angstrom, energies and forces are given in kcal/mol and kcal/mol/A respectively. ### Supported Tasks and Leaderboards `naphthalene` should be used for organic molecular property prediction, a regression task on 1 property. The score used is Mean absolute errors (in meV) for energy prediction. ## External Use ### PyGeometric To load in PyGeometric, do the following: ```python from datasets import load_dataset from torch_geometric.data import Data from torch_geometric.loader import DataLoader dataset_hf = load_dataset("graphs-datasets/<mydataset>") # For the train set (replace by valid or test as needed) dataset_pg_list = [Data(graph) for graph in dataset_hf["train"]] dataset_pg = DataLoader(dataset_pg_list) ``` ## Dataset Structure ### Data Properties | property | value | |---|---| | scale | big | | #graphs | 226255 | | average #nodes | 18.0 | | average #edges | 254.73246234354005 | ### Data Fields Each row of a given file is a graph, with: - `node_feat` (list: #nodes x #node-features): nodes - `edge_index` (list: 2 x #edges): pairs of nodes constituting edges - `edge_attr` (list: #edges x #edge-features): for the aforementioned edges, contains their features - `y` (list: #labels): contains the number of labels available to predict - `num_nodes` (int): number of nodes of the graph ### Data Splits This data is not split, and should be used with cross validation. It comes from the PyGeometric version of the dataset. ## Additional Information ### Licensing Information The dataset has been released under license unknown. ### Citation Information ``` @inproceedings{Morris+2020, title={TUDataset: A collection of benchmark datasets for learning with graphs}, author={Christopher Morris and Nils M. Kriege and Franka Bause and Kristian Kersting and Petra Mutzel and Marion Neumann}, booktitle={ICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020)}, archivePrefix={arXiv}, eprint={2007.08663}, url={www.graphlearning.io}, year={2020} } ``` ``` @article{Chmiela_2017, doi = {10.1126/sciadv.1603015}, url = {https://doi.org/10.1126%2Fsciadv.1603015}, year = 2017, month = {may}, publisher = {American Association for the Advancement of Science ({AAAS})}, volume = {3}, number = {5}, author = {Stefan Chmiela and Alexandre Tkatchenko and Huziel E. Sauceda and Igor Poltavsky and Kristof T. Schütt and Klaus-Robert Müller}, title = {Machine learning of accurate energy-conserving molecular force fields}, journal = {Science Advances} } ```
提供机构:
graphs-datasets
原始信息汇总

数据集概述

数据集名称

  • naphthalene

数据集类型

  • 分子动力学(MD)数据集

数据集用途

  • 用于有机分子属性预测,具体为回归任务,预测单一属性

数据集特征

  • 总能量和力标签使用PBE+vdW-TS电子结构方法计算
  • 几何结构单位为Angstrom,能量单位为kcal/mol,力单位为kcal/mol/A

评估指标

  • 使用平均绝对误差(meV)评估能量预测

数据集结构

  • 数据属性

    • 规模:大
    • 图数量:226255
    • 平均节点数:18.0
    • 平均边数:254.73246234354005
  • 数据字段

    • node_feat:节点特征列表
    • edge_index:边索引列表
    • edge_attr:边特征列表
    • y:标签列表
    • num_nodes:节点数量

数据集使用

  • 建议使用交叉验证,数据未分割

许可证信息

  • 许可证:未知

引用信息

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作