five

introvoyz041/ZINC20

收藏
Hugging Face2026-03-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/introvoyz041/ZINC20
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: smiles dtype: large_string - name: zinc_id dtype: int64 - name: SELFIES dtype: string splits: - name: train num_bytes: 393170565049 num_examples: 1538340669 - name: val num_bytes: 47753116448 num_examples: 192292584 - name: test num_bytes: 46114402425 num_examples: 192292584 download_size: 174349539018 dataset_size: 487038083922 configs: - config_name: default data_files: - split: train path: data/train-* - split: val path: data/val-* - split: test path: data/test-* license: mit tags: - chemistry - biology - medical size_categories: - 1B<n<10B --- [ZINC20](https://zinc20.docking.org/) Dataset with [SELFIES](https://arxiv.org/abs/1905.13741) added. Any smile that could not be successfully converted was dropped from the dataset. Every tranch was downloaded, this is not the ~1B example ML subset from https://files.docking.org/zinc20-ML/. The dataset was entirely shuffled then split into 80%/10%/10% splits for train/val/test. A file vocab.csv is in the root of the reposity that contains all of the SELFIES tokens found in the data, with [START], [STOP], and [PAD] added.
提供机构:
introvoyz041
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作