five

graphs-datasets/alchemy

收藏
Hugging Face2023-02-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/graphs-datasets/alchemy
下载链接
链接失效反馈
官方服务:
资源简介:
--- licence: mit task_categories: - graph-ml --- # Dataset Card for alchemy ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [External Use](#external-use) - [PyGeometric](#pygeometric) - [Dataset Structure](#dataset-structure) - [Data Properties](#data-properties) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Additional Information](#additional-information) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **[Homepage](https://alchemy.tencent.com/)** - **Paper:**: (see citation) - **Leaderboard:**: [Leaderboard](https://alchemy.tencent.com/) ### Dataset Summary The `alchemy` dataset is a molecular dataset, called Alchemy, which lists 12 quantum mechanical properties of 130,000+ organic molecules comprising up to 12 heavy atoms (C, N, O, S, F and Cl), sampled from the GDBMedChem database. ### Supported Tasks and Leaderboards `alchemy` should be used for organic quantum molecular property prediction, a regression task on 12 properties. The score used is MAE. ## External Use ### PyGeometric To load in PyGeometric, do the following: ```python from datasets import load_dataset from torch_geometric.data import Data from torch_geometric.loader import DataLoader dataset_hf = load_dataset("graphs-datasets/<mydataset>") # For the train set (replace by valid or test as needed) dataset_pg_list = [Data(graph) for graph in dataset_hf["train"]] dataset_pg = DataLoader(dataset_pg_list) ``` ## Dataset Structure ### Data Properties | property | value | |---|---| | scale | big | | #graphs | 202578 | | average #nodes | 10.101387606810183 | | average #edges | 20.877326870011206 | ### Data Fields Each row of a given file is a graph, with: - `node_feat` (list: #nodes x #node-features): nodes - `edge_index` (list: 2 x #edges): pairs of nodes constituting edges - `edge_attr` (list: #edges x #edge-features): for the aforementioned edges, contains their features - `y` (list: 1 x #labels): contains the number of labels available to predict (here 1, equal to zero or one) - `num_nodes` (int): number of nodes of the graph ### Data Splits This data is not split, and should be used with cross validation. It comes from the PyGeometric version of the dataset. ## Additional Information ### Licensing Information The dataset has been released under license mit. ### Citation Information ``` @inproceedings{Morris+2020, title={TUDataset: A collection of benchmark datasets for learning with graphs}, author={Christopher Morris and Nils M. Kriege and Franka Bause and Kristian Kersting and Petra Mutzel and Marion Neumann}, booktitle={ICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020)}, archivePrefix={arXiv}, eprint={2007.08663}, url={www.graphlearning.io}, year={2020} } ``` ``` @article{DBLP:journals/corr/abs-1906-09427, author = {Guangyong Chen and Pengfei Chen and Chang{-}Yu Hsieh and Chee{-}Kong Lee and Benben Liao and Renjie Liao and Weiwen Liu and Jiezhong Qiu and Qiming Sun and Jie Tang and Richard S. Zemel and Shengyu Zhang}, title = {Alchemy: {A} Quantum Chemistry Dataset for Benchmarking {AI} Models}, journal = {CoRR}, volume = {abs/1906.09427}, year = {2019}, url = {http://arxiv.org/abs/1906.09427}, eprinttype = {arXiv}, eprint = {1906.09427}, timestamp = {Mon, 11 Nov 2019 12:55:11 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-1906-09427.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } ```

--- 许可证:MIT许可证 任务类别: - 图机器学习(graph-ml) --- # Alchemy数据集卡片 ## 目录 - [目录](#table-of-contents) - [数据集描述](#dataset-description) - [数据集摘要](#dataset-summary) - [支持任务与基准排行榜](#supported-tasks-and-leaderboards) - [外部使用](#external-use) - [PyGeometric](#pygeometric) - [数据集结构](#dataset-structure) - [数据属性](#data-properties) - [数据字段](#data-fields) - [数据划分](#data-splits) - [附加信息](#additional-information) - [许可证信息](#licensing-information) - [引用信息](#citation-information) - [贡献声明](#contributions) ## 数据集描述 - **[主页](https://alchemy.tencent.com/)** - **论文:**(详见引用信息) - **基准排行榜:** [基准排行榜](https://alchemy.tencent.com/) ### 数据集摘要 `alchemy`数据集是一款名为Alchemy的分子数据集,收录了从GDBMedChem数据库中采样得到的13万余个有机分子的12种量子力学属性,这些分子最多包含12个重原子(碳C、氮N、氧O、硫S、氟F和氯Cl)。 ### 支持任务与基准排行榜 该数据集可用于有机量子分子属性预测任务,即针对12种属性的回归任务,模型评估采用平均绝对误差(Mean Absolute Error,MAE)作为指标。 ## 外部使用 ### PyGeometric 若需通过PyGeometric加载该数据集,请执行以下代码: python from datasets import load_dataset from torch_geometric.data import Data from torch_geometric.loader import DataLoader dataset_hf = load_dataset('graphs-datasets/<mydataset>') # For the train set (replace by valid or test as needed) dataset_pg_list = [Data(graph) for graph in dataset_hf['train']] dataset_pg = DataLoader(dataset_pg_list) ## 数据集结构 ### 数据属性 | 属性 | 取值 | |---|---| | 规模 | 大 | | 图总数 | 202578 | | 平均节点数 | 10.101387606810183 | | 平均边数 | 20.877326870011206 | ### 数据字段 每个文件的每一行对应一张图,包含以下字段: - `node_feat`(列表:节点数 × 节点特征数):节点特征 - `edge_index`(列表:2 × 边数):构成边的节点对 - `edge_attr`(列表:边数 × 边特征数):上述边的特征信息 - `y`(列表:1 × 标签数):待预测的标签数量(此处为1,标签取值为0或1) - `num_nodes`(整数):该图的节点总数 ### 数据划分 该数据集未划分训练、验证与测试集,应结合交叉验证使用,其源自该数据集的PyGeometric版本。 ## 附加信息 ### 许可证信息 本数据集采用MIT许可证发布。 ### 引用信息 @inproceedings{Morris+2020, title={TUDataset: A collection of benchmark datasets for learning with graphs}, author={Christopher Morris and Nils M. Kriege and Franka Bause and Kristian Kersting and Petra Mutzel and Marion Neumann}, booktitle={ICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020)}, archivePrefix={arXiv}, eprint={2007.08663}, url={www.graphlearning.io}, year={2020} } @article{DBLP:journals/corr/abs-1906.09427, author = {Guangyong Chen and Pengfei Chen and Chang{-}Yu Hsieh and Chee{-}Kong Lee and Benben Liao and Renjie Liao and Weiwen Liu and Jiezhong Qiu and Qiming Sun and Jie Tang and Richard S. Zemel and Shengyu Zhang}, title = {Alchemy: {A} Quantum Chemistry Dataset for Benchmarking {AI} Models}, journal = {CoRR}, volume = {abs/1906.09427}, year = {2019}, url = {http://arxiv.org/abs/1906.09427}, eprinttype = {arXiv}, eprint = {1906.09427}, timestamp = {Mon, 11 Nov 2019 12:55:11 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-1906.09427.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } ### 贡献声明
提供机构:
graphs-datasets
原始信息汇总

数据集卡片 for alchemy

数据集描述

  • 数据集摘要alchemy 数据集是一个分子数据集,称为 Alchemy,列出了 130,000+ 个包含最多 12 个重原子(C、N、O、S、F 和 Cl)的有机分子的 12 种量子力学性质,这些分子是从 GDBMedChem 数据库中采样的。
  • 支持的任务和排行榜alchemy 应用于有机量子分子性质预测,这是一个关于 12 种性质的回归任务。使用的评分指标是 MAE。

数据集结构

  • 数据属性
    • 规模:大
    • 图数量:202578
    • 平均节点数:10.101387606810183
    • 平均边数:20.877326870011206
  • 数据字段
    • node_feat (列表: #nodes x #node-features):节点
    • edge_index (列表: 2 x #edges):构成边的节点对
    • edge_attr (列表: #edges x #edge-features):上述边的特征
    • y (列表: 1 x #labels):包含可预测的标签数量(此处为 1,等于零或一)
    • num_nodes (整数):图的节点数量
  • 数据分割:该数据未分割,应使用交叉验证。

附加信息

  • 许可信息:该数据集已发布在 mit 许可下。

  • 引用信息

    @inproceedings{Morris+2020, title={TUDataset: A collection of benchmark datasets for learning with graphs}, author={Christopher Morris and Nils M. Kriege and Franka Bause and Kristian Kersting and Petra Mutzel and Marion Neumann}, booktitle={ICML 2020 Workshop on Graph Representation Learning and Beyond (GRL+ 2020)}, archivePrefix={arXiv}, eprint={2007.08663}, url={www.graphlearning.io}, year={2020} }

    @article{DBLP:journals/corr/abs-1906-09427, author = {Guangyong Chen and Pengfei Chen and Chang{-}Yu Hsieh and Chee{-}Kong Lee and Benben Liao and Renjie Liao and Weiwen Liu and Jiezhong Qiu and Qiming Sun and Jie Tang and Richard S. Zemel and Shengyu Zhang}, title = {Alchemy: {A} Quantum Chemistry Dataset for Benchmarking {AI} Models}, journal = {CoRR}, volume = {abs/1906.09427}, year = {2019}, url = {http://arxiv.org/abs/1906.09427}, eprinttype = {arXiv}, eprint = {1906.09427}, timestamp = {Mon, 11 Nov 2019 12:55:11 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-1906-09427.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }

搜集汇总
数据集介绍
main_image_url
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作