five

ZINC15 for Drug Similarity Search

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/11090228
下载链接
链接失效反馈
官方服务:
资源简介:
Abstract: This dataset is a subset of the ZINC15 database, specifically filtered and processed for molecular similarity search applications using MegaMolBART embeddings. The subset focuses on drug-like molecules with specific physicochemical and purchasability properties. Keywords: ZINC15, Molecular Similarity Search, MegaMolBART, Drug Discovery, Cheminformatics. Background: The ZINC15 database is a comprehensive collection of commercially available compounds for virtual screening. This subset was created to facilitate the development of machine learning models for drug discovery, particularly those based on molecular embeddings. Methodology: The ZINC15 database was queried using the following criteria: Molecular weight <= 500 Daltons LogP <= 5 Reactivity level = "reactive" Purchasability = "annotated"  The resulting dataset was then processed to extract MegaMolBART embeddings for each molecule. Data Description:  The dataset is organized into three folders: /data/project/ubrite/drg-depot/zinc15-similarity-search/raw-data/ (66 GB): This folder contains the raw data files obtained from the ZINC15 database after applying the filtering criteria. /data/project/ubrite/drg-depot/zinc15-similarity-search/processed-data/ (13 GB): This folder contains the processed data, including the extracted MegaMolBART embeddings for each molecule.  /data/project/ubrite/drg-depot/zinc15-similarity-search/query/: This folder contains sample SMILES strings and their corresponding embeddings for performing similarity searches. Technical Specifications:  Format: SMILES strings, numerical data (embeddings) Size: 79 GB (total) License: This dataset is derived from the ZINC15 database and processed using MegaMolBART. It is subject to the licenses of both the ZINC15 database and the MegaMolBART model. ZINC15 Database: ZINC15 data is made available under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license. For more information, please visit the ZINC15 website. MegaMolBART: The MegaMolBART model and its associated data are copyrighted by AstraZeneca and NVIDIA. The usage of MegaMolBART is subject to the terms and conditions specified by the copyright holders. By using this dataset, you agree to comply with the licenses and conditions imposed by the ZINC15 database and MegaMolBART. Access and Usage:  The dataset is available for download through Zenodo. Users are encouraged to acknowledge this dataset and the corresponding Zenodo entry in any publications or research projects that utilize the data.  Contact: Fuad Al Abir, fuad021@uab.edu
创建时间:
2024-07-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作