introvoyz041/Guacamol
收藏Hugging Face2025-12-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/introvoyz041/Guacamol
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: SELFIE
dtype: string
- name: SMILES
dtype: string
splits:
- name: train
num_bytes: 351930666
num_examples: 1273077
- name: val
num_bytes: 21949958
num_examples: 79564
- name: test
num_bytes: 65952355
num_examples: 238694
download_size: 147769741
dataset_size: 439832979
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: val
path: data/val-*
- split: test
path: data/test-*
---
# Dataset Card for Guacamol
Dataset from the [Guacamol](https://github.com/BenevolentAI/guacamol) benchmark ([paper](https://arxiv.org/abs/1811.09621)).
Dataset contains two columns, SMILE and SELFIE. Splits are identical to original splits, however, any SMILE that could not be converted to a SELFIE was dropped. Likewise, any SELFIE in the val/test splits that contained a token not found in the train split was dropped.
Can be used with [this tokenizer](https://huggingface.co/haydn-jones/GuacamolSELFIETokenizer).
提供机构:
introvoyz041



