five

katielink/moleculenet-benchmark

收藏
Hugging Face2023-08-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/katielink/moleculenet-benchmark
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 tags: - biology - chemistry configs: - config_name: bace data_files: - split: train path: bace/train.csv - split: test path: bace/test.csv - split: val path: bace/valid.csv - config_name: bbbp data_files: - split: train path: bbbp/train.csv - split: test path: bbbp/test.csv - split: val path: bbbp/valid.csv - config_name: clintox data_files: - split: train path: clintox/train.csv - split: test path: clintox/test.csv - split: val path: clintox/valid.csv - config_name: esol data_files: - split: train path: esol/train.csv - split: test path: esol/test.csv - split: val path: esol/valid.csv - config_name: freesolv data_files: - split: train path: freesolv/train.csv - split: test path: freesolv/test.csv - split: val path: freesolv/valid.csv - config_name: hiv data_files: - split: train path: hiv/train.csv - split: test path: hiv/test.csv - split: val path: hiv/valid.csv - config_name: lipo data_files: - split: train path: lipo/train.csv - split: test path: lipo/test.csv - split: val path: lipo/valid.csv - config_name: qm9 data_files: - split: train path: qm9/train.csv - split: test path: qm9/test.csv - split: val path: qm9/valid.csv - config_name: sider data_files: - split: train path: sider/train.csv - split: test path: sider/test.csv - split: val path: sider/valid.csv - config_name: tox21 data_files: - split: train path: tox21/train.csv - split: test path: tox21/test.csv - split: val path: tox21/valid.csv --- # MoleculeNet Benchmark ([website](https://moleculenet.org/)) MoleculeNet is a benchmark specially designed for testing machine learning methods of molecular properties. As we aim to facilitate the development of molecular machine learning method, this work curates a number of dataset collections, creates a suite of software that implements many known featurizations and previously proposed algorithms. All methods and datasets are integrated as parts of the open source DeepChem package(MIT license). MoleculeNet is built upon multiple public databases. The full collection currently includes over 700,000 compounds tested on a range of different properties. We test the performances of various machine learning models with different featurizations on the datasets(detailed descriptions here), with all results reported in AUC-ROC, AUC-PRC, RMSE and MAE scores. For users, please cite: Zhenqin Wu, Bharath Ramsundar, Evan N. Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S. Pappu, Karl Leswing, Vijay Pande, MoleculeNet: A Benchmark for Molecular Machine Learning, arXiv preprint, arXiv: 1703.00564, 2017.
提供机构:
katielink
原始信息汇总

MoleculeNet Benchmark

概述

MoleculeNet 是一个专门为测试分子性质机器学习方法设计的基准。该基准集合了多个公开数据库,目前包含超过 700,000 个在不同性质上测试的化合物。通过测试不同特征化和机器学习模型在数据集上的表现,所有结果以 AUC-ROC、AUC-PRC、RMSE 和 MAE 分数报告。

数据集配置

  • bace
    • 训练集: bace/train.csv
    • 测试集: bace/test.csv
    • 验证集: bace/valid.csv
  • bbbp
    • 训练集: bbbp/train.csv
    • 测试集: bbbp/test.csv
    • 验证集: bbbp/valid.csv
  • clintox
    • 训练集: clintox/train.csv
    • 测试集: clintox/test.csv
    • 验证集: clintox/valid.csv
  • esol
    • 训练集: esol/train.csv
    • 测试集: esol/test.csv
    • 验证集: esol/valid.csv
  • freesolv
    • 训练集: freesolv/train.csv
    • 测试集: freesolv/test.csv
    • 验证集: freesolv/valid.csv
  • hiv
    • 训练集: hiv/train.csv
    • 测试集: hiv/test.csv
    • 验证集: hiv/valid.csv
  • lipo
    • 训练集: lipo/train.csv
    • 测试集: lipo/test.csv
    • 验证集: lipo/valid.csv
  • qm9
    • 训练集: qm9/train.csv
    • 测试集: qm9/test.csv
    • 验证集: qm9/valid.csv
  • sider
    • 训练集: sider/train.csv
    • 测试集: sider/test.csv
    • 验证集: sider/valid.csv
  • tox21
    • 训练集: tox21/train.csv
    • 测试集: tox21/test.csv
    • 验证集: tox21/valid.csv
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作