Untitled PCL
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5995570
下载链接
链接失效反馈官方服务:
资源简介:
This is the repository for regular updates of the PubChemLite for Exposomics data collection. PubChemLite for Exposomics is a subset of
PubChem selected from major
categories of the
Table of Contents
page at the PubChem Classification Browser, described in
DOI:10.1186/s13321-021-00489-0.
PubChemLite for Exposomics is compiled from 10 categories:
AgroChemInfo, BioPathway, DrugMedicInfo, FoodRelated, PharmacoInfo,
SafetyInfo, ToxicityInfo, KnownUse, DisorderDisease, Identification.
PubChemCIDs have been collapsed by InChIKey first block, reporting the
structure from the most annotated CID, plus related CIDs. Entries that
will be ignored by MetFrag (salts, disconnected substances) or cause
errors (e.g. transition metals) have been removed. The Patent and
PubMed ID counts are extracted from files on the PubChem FTP site. The
`AnnoTypeCount' term counts how many of the categories are represented, the
subsequent column (named per category) counts the number of annotation
categories available in the next sub-category of the TOC entry.
These files can be used `as is' as localCSV for
MetFrag Command Line.
本仓库用于定期更新暴露组学用PubChemLite数据集集合。
暴露组学用PubChemLite(PubChemLite for Exposomics)是从PubChem分类浏览器(PubChem Classification Browser)的目录(Table of Contents)页面下的主要类别中遴选得到的PubChem子集,相关说明参见DOI:10.1186/s13321-021-00489-0。
暴露组学用PubChemLite的构建涵盖以下10个类别:农业化学信息(AgroChemInfo)、生物通路(BioPathway)、药物医学信息(DrugMedicInfo)、食品相关(FoodRelated)、药理学信息(PharmacoInfo)、安全性信息(SafetyInfo)、毒性信息(ToxicityInfo)、已知用途(KnownUse)、疾病紊乱(DisorderDisease)、识别鉴定(Identification)。
本数据集已通过国际化学标识键(InChIKey)的首区块对PubChem CID进行归并,以注释最为丰富的CID对应的分子结构作为代表,并附带相关联的CID。MetFrag工具会忽略的条目(如盐类、非连通性物质)或会引发错误的条目(如过渡金属)均已被移除。专利与PubMed标识符的计数信息均从PubChem FTP站点的文件中提取得到。`AnnoTypeCount`字段用于统计条目所覆盖的类别数量,后续以各类别命名的列则用于统计目录条目下一级子类别中可用的注释类别数目。
本数据集文件可直接作为MetFrag命令行工具的localCSV文件使用。
创建时间:
2022-02-07



