Cheminformatics Analysis of RNA-Binding Ligands (Master Thesis Dataset)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14800358
下载链接
链接失效反馈官方服务:
资源简介:
Title: ThesisDataset_RNA-binders
Description:This dataset is part of the master thesis “Cheminformatics Analysis of RNA-Binding Ligands” by Ing. Jozef Fülöp (Faculty of Chemical Technology, Prague, 2024). It comprises raw and processed data for RNA-binding molecule classification.
The dataset is divided into two parts:
Set1 Large – Contains 77,420 compounds from specialized chemical libraries including the Enamine Hit Locator Library, ChemDiv miRNA-targeted Library, Enamine RNA Library, Life Chemicals RNA Focused and Targeted Libraries, and the ROBIN Database.
Set2 Small – A curated collection of 3,922 compounds for binary classification including RNA binders and non-binders from the ROBIN repository, protein binders from the Probes & Drugs database, and non-binders from ZINC15’s Dark Matter.
Data Processing:• Raw data were provided in SDF/CSV formats; only the SMILES were extracted for further analysis.• Chemical structures were converted to canonical SMILES using RDKit.• Standardization was performed with the ChEMBL Structure Pipeline.• Duplicate entries were removed based on canonical SMILES.
Columns in the processed CSV files:
• Set1 Large CSV: - “smiles”: Canonical SMILES representation. - “source”: Library origin. - “ecfp6”: 2048-bit Extended Connectivity Fingerprint (radius=3). - “bit_info_map”: Dictionary mapping fingerprint bit positions to molecular fragments. - “rna”: Binary indicator for RNA-binding (0 or 1).
• Set2 Small CSVs: - “source”: Originating library or subset. - “smiles”: Canonical SMILES representation. - “ecfp6”: 2048-bit fingerprint. - “bit_info_map”: Dictionary mapping fingerprint bit positions to molecular fragments. - “label”: Binary binding label (1 for binder, 0 for non-binder).
License:This dataset is released under the CC BY 4.0 license. Users are kindly requested to cite the associated thesis when using the data.
For further information, please refer to the thesis or contact fulopj@vscht.cz
创建时间:
2025-02-17



