BASF-AI/CoconutSmiles2NameBitextMining
收藏Hugging Face2024-09-26 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/BASF-AI/CoconutSmiles2NameBitextMining
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: cc-by-nc-sa-4.0
size_categories:
- 100K<n<1M
pretty_name: CoconutDB SMILES to Formula Bitext Mining
dataset_info:
features:
- name: formula
dtype: string
- name: smiles
dtype: string
splits:
- name: test
num_bytes: 65624656
num_examples: 621631
download_size: 31890909
dataset_size: 65624656
configs:
- config_name: default
data_files:
- split: test
path: data/test-*
tags:
- chemistry
- coconutdb
- SMILES
- chemteb
---
# CoconutDB SMILES to Formula Bitext Mining
This dataset consists of two lists: one containing both isomeric and canonical SMILES strings, and the other containing the corresponding molecular formulas of chemical entities, sourced from [CoconutDB](https://coconut.naturalproducts.net/). The primary task is to identify matching pairs between the SMILES strings and their molecular formulas. Each SMILES string from the first list should be accurately aligned with its corresponding molecular formula from the second list.
提供机构:
BASF-AI



