DyePerm_Dataset
收藏DataCite Commons2025-12-04 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/DyePerm_Dataset/30764654/5
下载链接
链接失效反馈官方服务:
资源简介:
DyePermDB is a curated dataset of 202 fluorescent and chromogenic dyes with experimentally supported membrane-permeability annotations. The resource integrates structural identifiers, physicochemical attributes, qualitative solubility, toxicity notes, and literature evidence from PubChem, DrugBank and primary publications. Each dye is annotated with one of three permeability labels (“Yes”, “Yes (conditional)”, “No”), independently reviewed and cross-validated by domain experts.To assess dataset quality and structural coherence, we performed descriptive statistical analyses, XGBoost-based permeability classification using FP4 fingerprints, and feature-importance evaluation via random forests, revealing strong structure–permeability signals driven by heteroatom content and SMILES-derived features. The repository includes the full dataset, DrugBank-linked subset, reproducible train/test splits, and Python scripts for all modelling tasks.This dataset supports cheminformatics research, QSAR/QSPR modelling, fluorescent probe selection, and dye-oriented drug repurposing studies.
DyePermDB是一个经人工精选的数据集,包含202种带有经实验验证的膜通透性注释的荧光染料与生色染料。该资源整合了来自PubChem、DrugBank及原始发表文献的结构标识符、理化性质参数、定性溶解度数据、毒性备注信息与文献佐证证据。每种染料均被标注为三类通透性标签之一("是""条件性是""否"),所有标注均经过领域专家独立审核与交叉验证。为评估数据集的质量与结构一致性,研究团队开展了描述性统计分析、基于FP4指纹(FP4 fingerprints)的XGBoost通透性分类建模,以及通过随机森林实现的特征重要性评估,最终揭示了由杂原子含量与SMILES衍生特征所驱动的显著结构-通透性关联信号。本数据集仓库包含完整数据集、与DrugBank关联的子集、可复现的训练/测试集划分方案,以及用于所有建模任务的Python脚本。该数据集可支撑化学信息学研究、定量结构-活性关系/定量结构-性质关系(QSAR/QSPR)建模、荧光探针筛选,以及面向染料的药物重定位研究。
提供机构:
figshare
创建时间:
2025-12-03



