five

DyePerm_Dataset

收藏
DataCite Commons2025-12-03 更新2026-04-25 收录
下载链接:
https://figshare.com/articles/dataset/DyePerm_Dataset/30764654/3
下载链接
链接失效反馈
官方服务:
资源简介:
DyePermDB is a curated dataset of 202 fluorescent and chromogenic dyes with experimentally supported membrane-permeability annotations. The resource integrates structural identifiers, physicochemical attributes, qualitative solubility, toxicity notes, and literature evidence from PubChem, DrugBank and primary publications. Each dye is annotated with one of three permeability labels (“Yes”, “Yes (conditional)”, “No”), independently reviewed and cross-validated by domain experts.To assess dataset quality and structural coherence, we performed descriptive statistical analyses, XGBoost-based permeability classification using FP4 fingerprints, and feature-importance evaluation via random forests, revealing strong structure–permeability signals driven by heteroatom content and SMILES-derived features. The repository includes the full dataset, DrugBank-linked subset, reproducible train/test splits, and Python scripts for all modelling tasks.This dataset supports cheminformatics research, QSAR/QSPR modelling, fluorescent probe selection, and dye-oriented drug repurposing studies.

DyePermDB是一款经人工整理甄选的数据集,涵盖202种荧光染料与生色染料,所有样本均带有经实验验证的膜通透性注释。该资源整合了PubChem、DrugBank及原始文献中的结构标识符、理化性质参数、定性溶解度数据、毒性备注与文献佐证信息。每种染料均被标注为三类通透性标签之一("是"、"条件性是"、"否"),所有标签均经过领域专家独立审核与交叉验证。为评估数据集质量与结构一致性,研究团队开展了描述性统计分析、基于FP4指纹(FP4 fingerprints)的极限梯度提升树(XGBoost, Extreme Gradient Boosting)通透性分类建模,以及通过随机森林进行的特征重要性评估,结果揭示了由杂原子含量与SMILES(Simplified Molecular Input Line Entry System)衍生特征驱动的显著结构-通透性关联信号。该数据集仓库包含完整数据集、与DrugBank关联的子集、可复现的训练/测试划分集,以及适配所有建模任务的Python脚本。本数据集可支撑化学信息学研究、定量构效关系(QSAR, Quantitative Structure-Activity Relationship)/定量构性关系(QSPR, Quantitative Structure-Property Relationship)建模、荧光探针筛选以及面向染料的药物重定位研究。
提供机构:
figshare
创建时间:
2025-12-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作