BASF-AI/CoconutSmiles2NameBitextMining2
收藏Hugging Face2024-09-26 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/BASF-AI/CoconutSmiles2NameBitextMining2
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: formula
dtype: string
- name: smiles
dtype: string
splits:
- name: test
num_bytes: 532021
num_examples: 5000
download_size: 261403
dataset_size: 532021
configs:
- config_name: default
data_files:
- split: test
path: data/test-*
license: cc-by-nc-sa-4.0
language:
- en
tags:
- chemistry
- coconutdb
- SMILES
- chemteb
size_categories:
- 1K<n<10K
pretty_name: CoconutDB SMILES to Formula Bitext Mining
---
# CoconutDB SMILES to Formula Bitext Mining
This dataset consists of two lists: one containing both isomeric and canonical SMILES strings, and the other containing the corresponding molecular formulas of chemical entities, sourced from [CoconutDB](https://coconut.naturalproducts.net/). The primary task is to identify matching pairs between the SMILES strings and their molecular formulas. Each SMILES string from the first list should be accurately aligned with its corresponding molecular formula from the second list. This dataset is a subset of [BASF-We-Create-Chemistry/CoconutSmiles2NameBitextMining](https://huggingface.co/datasets/BASF-We-Create-Chemistry/CoconutSmiles2NameBitextMining) with 5000 samples.
提供机构:
BASF-AI



