five

Development of a Chemical Structure Comparison Method for Integrated Analysis of Chemical and Genomic Information in the Metabolic Pathways

收藏
NIAID Data Ecosystem2026-03-06 收录
下载链接:
https://figshare.com/articles/dataset/Development_of_a_Chemical_Structure_Comparison_Method_for_Integrated_Analysis_of_Chemical_and_Genomic_Information_in_the_Metabolic_Pathways/3652431
下载链接
链接失效反馈
官方服务:
资源简介:
Cellular functions result from intricate networks of molecular interactions, which involve not only proteins and nucleic acids but also small chemical compounds. Here we present an efficient algorithm for comparing two chemical structures of compounds, where the chemical structure is treated as a graph consisting of atoms as nodes and covalent bonds as edges. On the basis of the concept of functional groups, 68 atom types (node types) are defined for carbon, nitrogen, oxygen, and other atomic species with different environments, which has enabled detection of biochemically meaningful features. Maximal common subgraphs of two graphs can be found by searching for maximal cliques in the association graph, and we have introduced heuristics to accelerate the clique finding and to detect optimal local matches (simply connected common subgraphs). Our procedure was applied to the comparison and clustering of 9383 compounds, mostly metabolic compounds, in the KEGG/LIGAND database. The largest clusters of similar compounds were related to carbohydrates, and the clusters corresponded well to the categorization of pathways as represented by the KEGG pathway map numbers. When each pathway map was examined in more detail, finer clusters could be identified corresponding to subpathways or pathway modules containing continuous sets of reaction steps. Furthermore, it was found that the pathway modules identified by similar compound structures sometimes overlap with the pathway modules identified by genomic contexts, namely, by operon structures of enzyme genes.

细胞功能源于错综复杂的分子相互作用网络,这类网络不仅涉及蛋白质与核酸,还包含小型化学化合物。本研究提出一种高效的化合物化学结构比对算法,该算法将化学结构建模为以原子为节点、共价键为边的图结构。基于官能团(functional group)的概念,针对碳、氮、氧及其他处于不同化学环境的原子种类,定义了68种原子类型(节点类型),从而实现了具有生物化学意义的特征检测。两张图的最大公共子图可通过在关联图中搜索最大团(maximal clique)来获取,本研究引入启发式算法以加速团搜索并检测最优局部匹配(即单连通公共子图(simply connected common subgraphs))。我们将所提方法应用于KEGG/LIGAND数据库中9383种(多数为代谢化合物)化合物的比对与聚类分析。相似化合物构成的最大聚类与碳水化合物(carbohydrate)相关,且聚类结果与以KEGG通路图谱编号所表征的通路分类高度吻合。当对每条通路图谱进行细致分析时,可进一步识别出更细粒度的聚类,这些聚类对应包含连续反应步骤集合的子通路或通路模块(pathway module)。此外,研究发现基于相似化合物结构识别出的通路模块,有时与基于基因组背景(即酶基因的操纵子(operon)结构)所识别的通路模块存在重叠。
创建时间:
2016-08-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作