five

Cheminformatics Analysis of RNA-Binding Ligands (Master Thesis Dataset)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14800358
下载链接
链接失效反馈
官方服务:
资源简介:
Title: ThesisDataset_RNA-binders Description:This dataset is part of the master thesis “Cheminformatics Analysis of RNA-Binding Ligands” by Ing. Jozef Fülöp (Faculty of Chemical Technology, Prague, 2024). It comprises raw and processed data for RNA-binding molecule classification. The dataset is divided into two parts: Set1 Large – Contains 77,420 compounds from specialized chemical libraries including the Enamine Hit Locator Library, ChemDiv miRNA-targeted Library, Enamine RNA Library, Life Chemicals RNA Focused and Targeted Libraries, and the ROBIN Database. Set2 Small – A curated collection of 3,922 compounds for binary classification including RNA binders and non-binders from the ROBIN repository, protein binders from the Probes & Drugs database, and non-binders from ZINC15’s Dark Matter. Data Processing:• Raw data were provided in SDF/CSV formats; only the SMILES were extracted for further analysis.• Chemical structures were converted to canonical SMILES using RDKit.• Standardization was performed with the ChEMBL Structure Pipeline.• Duplicate entries were removed based on canonical SMILES. Columns in the processed CSV files: • Set1 Large CSV: - “smiles”: Canonical SMILES representation. - “source”: Library origin. - “ecfp6”: 2048-bit Extended Connectivity Fingerprint (radius=3). - “bit_info_map”: Dictionary mapping fingerprint bit positions to molecular fragments. - “rna”: Binary indicator for RNA-binding (0 or 1). • Set2 Small CSVs: - “source”: Originating library or subset. - “smiles”: Canonical SMILES representation. - “ecfp6”: 2048-bit fingerprint. - “bit_info_map”: Dictionary mapping fingerprint bit positions to molecular fragments. - “label”: Binary binding label (1 for binder, 0 for non-binder). License:This dataset is released under the CC BY 4.0 license. Users are kindly requested to cite the associated thesis when using the data. For further information, please refer to the thesis or contact fulopj@vscht.cz
创建时间:
2025-02-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作