Supporting Information for "A Fragment Library of Natural Products and its Comparative Chemoinformatic Characterization"
收藏DataCite Commons2020-08-25 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/Supporting_Information_for_A_Fragment_Library_of_Natural_Products_and_its_Comparative_Chemoinformatic_Characterization_/11997951/3
下载链接
链接失效反馈官方服务:
资源简介:
COCONUT_Compounds.sdf, ChEMBL_Compounds.csv and REAL_Compounds.csv contain the curated structures of drug-like subsets from those major compound data sets. All files contain the following information for each compound: identification number (ID), simplified molecular input line entry system (Smiles), Average Molecular Weight (AMW), partition coefficient octanol/water (SlogP), number of hydrogen bond donors (HBD), number of hydrogen bond acceptors (HBA), number of rotatable bonds (RB), topological polar surface area (TPSA), fraction of sp3 carbons(FractionCSP3), fraction of chiral carbons (FractionCC), number of generated fragments (NFragments) and a list of the fragments obtained if any (LFragments).<br>COCONUT_Fragments.sdf, ChEMBL_Fragments.csv and REAL_Fragments.csv contain the structures generated from the respective compound data sets. All files include the following information for each fragment: identification number (ID), source collection (Data Set), simplified molecular input line entry system (Fragment), uniqueness (Unique), number of compounds containing that fragment in the data set (Counts) and fraction of them (Proportion), fraction of sp3 carbons (FractionCSP3) and fraction of chiral carbons (FractionCC).<br>SB-DFPs.csv contains Statistical-Based Database Fingerprints for COCONUT and REAL data sets. The file includes the value for each bit for a Morgan fingerprint of radius 2 (1024-bits) according to RDKit algorithm as well as the empirical minimum and maximum Tanimoto similarity values used for scaling of the data (MinSimilarity and MaxSimilarity).
COCONUT_Compounds.sdf、ChEMBL_Compounds.csv与REAL_Compounds.csv收录了从上述主流化合物数据集中经整理筛选得到的类药子集结构。所有文件均为每条化合物存储如下信息:识别号(ID)、简化分子线性输入规范(Simplified Molecular Input Line Entry System, Smiles)、平均分子量(Average Molecular Weight, AMW)、正辛醇-水分配系数(Partition Coefficient Octanol/Water, SlogP)、氢键供体数目(Hydrogen Bond Donors, HBD)、氢键受体数目(Hydrogen Bond Acceptors, HBA)、可旋转键数目(Rotatable Bonds, RB)、拓扑极性表面积(Topological Polar Surface Area, TPSA)、sp3杂化碳占比(Fraction of sp3 Carbons, FractionCSP3)、手性碳占比(Fraction of Chiral Carbons, FractionCC)、生成片段数目(Number of Generated Fragments, NFragments)以及对应片段列表(若存在,List of Obtained Fragments, LFragments)。
COCONUT_Fragments.sdf、ChEMBL_Fragments.csv与REAL_Fragments.csv包含从对应化合物数据集生成的片段结构。所有文件均为每条片段存储如下信息:识别号(ID)、来源数据集(Data Set)、简化分子线性输入规范(Simplified Molecular Input Line Entry System, Fragment)、唯一性标识(Unique)、数据集中包含该片段的化合物数目(Counts)及其占比(Proportion)、sp3杂化碳占比(Fraction of sp3 Carbons, FractionCSP3)与手性碳占比(Fraction of Chiral Carbons, FractionCC)。
SB-DFPs.csv收录了COCONUT与REAL数据集的基于统计的数据库指纹(Statistical-Based Database Fingerprints)。该文件包含基于RDKit算法生成的半径为2的1024位摩根指纹(Morgan fingerprint)的每一位取值,以及用于数据缩放的经验最小与最大塔尼莫特相似度值(MinSimilarity和MaxSimilarity)。
提供机构:
figshare
创建时间:
2020-03-18



