Derify/safe_druglike_QED_Pfizer_11m
收藏Hugging Face2026-01-01 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Derify/safe_druglike_QED_Pfizer_11m
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: smiles
dtype: string
- name: safe
dtype: string
splits:
- name: train
num_bytes: 896618677.3205465
num_examples: 9921351
- name: validation
num_bytes: 92803360.80148429
num_examples: 1026838
- name: test
num_bytes: 40811883.70340547
num_examples: 451559
download_size: 578210759
dataset_size: 1030233921.8254362
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
license: cc-by-4.0
task_categories:
- mask-generation
- feature-extraction
- text-generation
tags:
- chemistry
- smiles
- cheminformatics
- safe
pretty_name: SAFE Druglike QED Pfizer 11M
size_categories:
- 10M<n<100M
---
## Drug-like QED Pfizer 11M — SAFE Dataset
This dataset is derived from the [*Drug-like Molecule Datasets for Drug Discovery*](https://zenodo.org/records/7547717) collection. Molecular structures were converted to SAFE (Sequential Attachment-based Fragment Embedding) representations using **safe-mol v0.1.14**.
## Source
[](https://doi.org/10.5281/zenodo.7547717)
## 数据集信息
### 特征字段
- 字段名:简化分子线性输入符 (SMILES),数据类型:字符串
- 字段名:SAFE (Sequential Attachment-based Fragment Embedding,基于序列附着的片段嵌入),数据类型:字符串
### 数据集划分
- 训练集:字节占用896618677.3205465,样本数量9921351
- 验证集:字节占用92803360.80148429,样本数量1026838
- 测试集:字节占用40811883.70340547,样本数量451559
### 整体参数
- 下载大小:578210759
- 数据集总大小:1030233921.8254362
### 配置项
- 默认配置:
数据文件配置:
- 训练集:数据路径为`data/train-*`
- 验证集:数据路径为`data/validation-*`
- 测试集:数据路径为`data/test-*`
### 许可协议
知识共享署名4.0(CC BY 4.0)
### 任务类别
掩码生成、特征提取、文本生成
### 标注标签
化学、简化分子线性输入符 (SMILES)、化学信息学、SAFE (Sequential Attachment-based Fragment Embedding)
### 展示名称
类药SAFE QED Pfizer 11M
### 规模类别
1000万 < n < 1亿
## 类药SAFE QED Pfizer 11M 数据集
本数据集源自[*药物发现用类药分子数据集*](https://zenodo.org/records/7547717)数据集集合。研究人员通过**safe-mol v0.1.14**工具,将分子结构转换为SAFE表征形式。
## 数据来源
[](https://doi.org/10.5281/zenodo.7547717)
提供机构:
Derify



