资源简介:
---
annotations_creators:
- machine-generated
language_creators:
- machine-generated
license:
- mit
multilinguality:
- monolingual
pretty_name: bbbp
size_categories:
- 1K<n<10K
source_datasets: []
tags:
- bio
- bio-chem
- molnet
- molecule-net
- biophysics
task_categories:
- other
task_ids: []
---
# Dataset Card for bbbp
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage: https://moleculenet.org/**
- **Repository: https://github.com/deepchem/deepchem/tree/master**
- **Paper: https://arxiv.org/abs/1703.00564**
### Dataset Summary
`bbbp` is a dataset included in [MoleculeNet](https://moleculenet.org/). This dataset has binary labels of blood-brain barrier penetration(permeability).
## Dataset Structure
### Data Fields
Each split contains
* `smiles`: the [SMILES](https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system) representation of a molecule
* `selfies`: the [SELFIES](https://github.com/aspuru-guzik-group/selfies) representation of a molecule
* `target`: blood-brain barrier penetration(permeability)
### Data Splits
The dataset is split into an 80/10/10 train/valid/test split using scaffold split.
### Source Data
#### Initial Data Collection and Normalization
Data was originially generated by the Pande Group at Standford
### Licensing Information
This dataset was originally released under an MIT license
### Citation Information
```
@misc{https://doi.org/10.48550/arxiv.1703.00564,
doi = {10.48550/ARXIV.1703.00564},
url = {https://arxiv.org/abs/1703.00564},
author = {Wu, Zhenqin and Ramsundar, Bharath and Feinberg, Evan N. and Gomes, Joseph and Geniesse, Caleb and Pappu, Aneesh S. and Leswing, Karl and Pande, Vijay},
keywords = {Machine Learning (cs.LG), Chemical Physics (physics.chem-ph), Machine Learning (stat.ML), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Physical sciences, FOS: Physical sciences},
title = {MoleculeNet: A Benchmark for Molecular Machine Learning},
publisher = {arXiv},
year = {2017},
copyright = {arXiv.org perpetual, non-exclusive license}
}
```
### Contributions
Thanks to [@zanussbaum](https://github.com/zanussbaum) for adding this dataset.
annotations_creators:
- 机器生成
language_creators:
- 机器生成
license:
- MIT
multilinguality:
- 单语言数据集
pretty_name: bbbp
size_categories:
- 1000 < 样本量 < 10000
source_datasets:
- 无
tags:
- 生物(bio)
- 生物化学(bio-chem)
- 分子网络(molnet)
- 分子网络(molecule-net)
- 生物物理学(biophysics)
task_categories:
- 其他任务
task_ids:
- 无
# bbbp 数据集卡片
## 目录
- [目录](#目录)
- [数据集描述](#数据集描述)
- [数据集概述](#数据集概述)
- [支持任务与排行榜](#支持任务与排行榜)
- [语言](#语言)
- [数据集结构](#数据集结构)
- [数据实例](#数据实例)
- [数据字段](#数据字段)
- [数据划分](#数据划分)
- [数据集构建](#数据集构建)
- [数据集整理依据](#数据集整理依据)
- [源数据](#源数据)
- [注释](#注释)
- [个人与敏感信息](#个人与敏感信息)
- [数据集使用注意事项](#数据集使用注意事项)
- [数据集的社会影响](#数据集的社会影响)
- [偏差讨论](#偏差讨论)
- [其他已知局限性](#其他已知局限性)
- [附加信息](#附加信息)
- [数据集维护者](#数据集维护者)
- [许可证信息](#许可证信息)
- [引用信息](#引用信息)
- [贡献](#贡献)
## 数据集描述
- **主页:https://moleculenet.org/**
- **仓库:https://github.com/deepchem/deepchem/tree/master**
- **论文:https://arxiv.org/abs/1703.00564**
### 数据集概述
`bbbp` 是收录于 [MoleculeNet](https://moleculenet.org/) 的数据集。该数据集包含血脑屏障渗透(blood-brain barrier penetration)的二元分类标签。
## 数据集结构
### 数据字段
每个数据划分包含以下字段:
* `smiles`:分子的 [SMILES(简化分子线性输入系统,Simplified molecular-input line-entry system)](https://en.wikipedia.org/wiki/Simplified_molecular-input_line-entry_system) 表示形式
* `selfies`:分子的 [SELFIES](https://github.com/aspuru-guzik-group/selfies) 表示形式
* `target`:血脑屏障渗透属性
### 数据划分
该数据集采用支架拆分(scaffold split)方法划分为训练集、验证集与测试集,比例为80/10/10。
### 源数据
#### 初始数据收集与标准化
该数据集最初由斯坦福大学潘德课题组生成。
### 许可证信息
本数据集最初以MIT许可证发布。
### 引用信息
@misc{https://doi.org/10.48550/arxiv.1703.00564,
doi = {10.48550/ARXIV.1703.00564},
url = {https://arxiv.org/abs/1703.00564},
author = {Wu, Zhenqin 与 Ramsundar, Bharath 与 Feinberg, Evan N. 与 Gomes, Joseph 与 Geniesse, Caleb 与 Pappu, Aneesh S. 与 Leswing, Karl 与 Pande, Vijay},
keywords = {机器学习(cs.LG)、化学物理(physics.chem-ph)、机器学习(stat.ML)、FOS: 计算机与信息科学、FOS: 计算机与信息科学、FOS: 物理科学、FOS: 物理科学},
title = {MoleculeNet:面向分子机器学习的基准测试集},
publisher = {arXiv},
year = {2017},
copyright = {arXiv.org永久非排他性许可证}
}
### 贡献
感谢 [@zanussbaum](https://github.com/zanussbaum) 为本数据集的收录提供支持。