Greedjar74/Leash_Bio
收藏Hugging Face2024-06-05 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/Greedjar74/Leash_Bio
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: 'Unnamed: 0'
dtype: int64
- name: id
dtype: int64
- name: buildingblock1_smiles
dtype: string
- name: buildingblock2_smiles
dtype: string
- name: buildingblock3_smiles
dtype: string
- name: molecule_smiles
dtype: string
- name: protein_name
dtype: string
- name: binds
dtype: int64
- name: molecule
dtype: string
- name: ecfp
dtype: string
- name: reactivity
dtype: float64
- name: atomic_num
dtype: int64
- name: molecular weight
dtype: float64
- name: Steric strain
dtype: int64
- name: LogP
dtype: float64
- name: TPSA
dtype: float64
- name: NHBD
dtype: int64
- name: NHBA
dtype: int64
- name: planarity
dtype: int64
- name: PSA
dtype: float64
- name: QED
dtype: float64
- name: BRD4
dtype: float64
- name: HSA
dtype: float64
- name: sEH
dtype: float64
- name: BRD4.1
dtype: float64
- name: HSA.1
dtype: float64
- name: sEH.1
dtype: float64
splits:
- name: train
num_bytes: 204976649
num_examples: 59058
download_size: 27493403
dataset_size: 204976649
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
This dataset is primarily used in the fields of chemistry and bioinformatics, containing information about molecular structures (represented in SMILES format), protein names, molecular properties (such as molecular weight, LogP, TPSA, etc.), and the binding status of molecules to proteins. The dataset is divided into a training set with 59058 samples.
提供机构:
Greedjar74
原始信息汇总
数据集概述
数据集特征
数据集包含以下特征:
Unnamed: 0: 整数类型id: 整数类型buildingblock1_smiles: 字符串类型buildingblock2_smiles: 字符串类型buildingblock3_smiles: 字符串类型molecule_smiles: 字符串类型protein_name: 字符串类型binds: 整数类型molecule: 字符串类型ecfp: 字符串类型reactivity: 浮点数类型atomic_num: 整数类型molecular weight: 浮点数类型Steric strain: 整数类型LogP: 浮点数类型TPSA: 浮点数类型NHBD: 整数类型NHBA: 整数类型planarity: 整数类型PSA: 浮点数类型QED: 浮点数类型BRD4: 浮点数类型HSA: 浮点数类型sEH: 浮点数类型BRD4.1: 浮点数类型HSA.1: 浮点数类型sEH.1: 浮点数类型
数据集分割
train: 包含59058个样本,数据集大小为204976649字节。
数据集大小
- 下载大小: 27493403字节
- 数据集大小: 204976649字节
配置
default配置:- 训练数据路径:
data/train-*
- 训练数据路径:



