BASF-AI/PubChemSMILESIsoTitleBM
收藏Hugging Face2024-09-23 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/BASF-AI/PubChemSMILESIsoTitleBM
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: title
dtype: string
- name: isomeric_smiles
dtype: string
splits:
- name: test
num_bytes: 736013
num_examples: 7070
download_size: 351601
dataset_size: 736013
configs:
- config_name: default
data_files:
- split: test
path: data/test-*
license: cc-by-nc-sa-4.0
language:
- en
pretty_name: PubChem Isomeric SMILES and Titles Bitext Mining
tags:
- chemistry
- pubchem
- SMILES
- chemteb
size_categories:
- 1K<n<10K
---
# PubChem Isomeric SMILES and Titles Bitext Mining
This dataset contains two separate lists: one of isomeric SMILES strings and the other of corresponding entity titles, both sourced from PubChem ([ChEBI](https://www.ebi.ac.uk/chebi/) source). The task is to identify matching pairs between the SMILES strings and the titles, where each SMILES string from the first list should be aligned with its corresponding entity title from the second list. The dataset is intended for bitext mining tasks, where the goal is to correctly retrieve the matching title for each chemical entity represented by its SMILES string, providing a valuable resource for chemical entity alignment and retrieval tasks.
提供机构:
BASF-AI



