BASF-AI/PubChemSMILESIsoDescPC
收藏Hugging Face2024-09-23 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/BASF-AI/PubChemSMILESIsoDescPC
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: description
dtype: string
- name: isomeric_smiles
dtype: string
- name: labels
dtype: int64
splits:
- name: test
num_bytes: 528788
num_examples: 2048
download_size: 238172
dataset_size: 528788
configs:
- config_name: default
data_files:
- split: test
path: data/test-*
task_categories:
- text-classification
language:
- en
tags:
- chemistry
- pubchem
- chemteb
pretty_name: PubChem Isomeric SMILES and Descriptions Pair Classification
size_categories:
- 1K<n<10K
license: cc-by-nc-sa-4.0
---
# PubChem Isomeric SMILES and Descriptions Pair Classification
This dataset contains pairs of isomeric SMILES strings and their corresponding descriptions, with labels indicating whether they refer to the same chemical entity. A label of 1 means the SMILES string and the description correspond to the same entity, while a label of 0 indicates they do not. The dataset is sourced from PubChem ([ChEBI](https://www.ebi.ac.uk/chebi/) source), and it provides valuable information for tasks involving chemical entity matching and description analysis.
数据集信息:
特征:
- 名称:描述(description),数据类型:字符串(string)
- 名称:异构SMILES(isomeric SMILES),数据类型:字符串(string)
- 名称:标签(labels),数据类型:64位整数(int64)
数据集划分:
- 名称:测试集(test),字节数:528788,样本数:2048
下载大小:238172,数据集总大小:528788
配置项:
- 配置名称:默认(default),数据文件:
- 划分:测试集,路径:data/test-*
任务类别:文本分类(text-classification)
语言:英语(en)
标签:化学(chemistry)、PubChem、chemteb
友好名称:PubChem异构SMILES与描述配对分类数据集
样本量区间:1000 < 样本数 < 10000
许可证:CC BY-NC-SA 4.0(知识共享署名-非商业性使用-相同方式共享4.0协议)
---
# PubChem异构SMILES与描述配对分类数据集
本数据集包含异构SMILES(isomeric SMILES)字符串与其对应描述的配对样本,通过标签标注该配对是否指向同一化学实体。标签取值为1时,表示该SMILES字符串与描述对应同一化学实体;取值为0时则表示二者不对应同一实体。本数据集源自PubChem([ChEBI](https://www.ebi.ac.uk/chebi/) 数据源),可为化学实体匹配与描述分析等相关任务提供有价值的支撑信息。
提供机构:
BASF-AI



