Development of a Chemical Structure Comparison Method for Integrated Analysis of Chemical and Genomic Information in the Metabolic Pathways
收藏NIAID Data Ecosystem2026-03-06 收录
下载链接:
https://figshare.com/articles/dataset/Development_of_a_Chemical_Structure_Comparison_Method_for_Integrated_Analysis_of_Chemical_and_Genomic_Information_in_the_Metabolic_Pathways/3652431
下载链接
链接失效反馈官方服务:
资源简介:
Cellular functions result from intricate networks of molecular interactions, which involve not only
proteins and nucleic acids but also small chemical compounds. Here we present an efficient algorithm for
comparing two chemical structures of compounds, where the chemical structure is treated as a graph
consisting of atoms as nodes and covalent bonds as edges. On the basis of the concept of functional
groups, 68 atom types (node types) are defined for carbon, nitrogen, oxygen, and other atomic species
with different environments, which has enabled detection of biochemically meaningful features. Maximal
common subgraphs of two graphs can be found by searching for maximal cliques in the association graph,
and we have introduced heuristics to accelerate the clique finding and to detect optimal local matches
(simply connected common subgraphs). Our procedure was applied to the comparison and clustering of
9383 compounds, mostly metabolic compounds, in the KEGG/LIGAND database. The largest clusters of
similar compounds were related to carbohydrates, and the clusters corresponded well to the categorization
of pathways as represented by the KEGG pathway map numbers. When each pathway map was examined
in more detail, finer clusters could be identified corresponding to subpathways or pathway modules containing
continuous sets of reaction steps. Furthermore, it was found that the pathway modules identified by similar
compound structures sometimes overlap with the pathway modules identified by genomic contexts, namely,
by operon structures of enzyme genes.
细胞功能源于错综复杂的分子相互作用网络,这类网络不仅涉及蛋白质与核酸,还包含小型化学化合物。本研究提出一种高效的化合物化学结构比对算法,该算法将化学结构建模为以原子为节点、共价键为边的图结构。基于官能团(functional group)的概念,针对碳、氮、氧及其他处于不同化学环境的原子种类,定义了68种原子类型(节点类型),从而实现了具有生物化学意义的特征检测。两张图的最大公共子图可通过在关联图中搜索最大团(maximal clique)来获取,本研究引入启发式算法以加速团搜索并检测最优局部匹配(即单连通公共子图(simply connected common subgraphs))。我们将所提方法应用于KEGG/LIGAND数据库中9383种(多数为代谢化合物)化合物的比对与聚类分析。相似化合物构成的最大聚类与碳水化合物(carbohydrate)相关,且聚类结果与以KEGG通路图谱编号所表征的通路分类高度吻合。当对每条通路图谱进行细致分析时,可进一步识别出更细粒度的聚类,这些聚类对应包含连续反应步骤集合的子通路或通路模块(pathway module)。此外,研究发现基于相似化合物结构识别出的通路模块,有时与基于基因组背景(即酶基因的操纵子(operon)结构)所识别的通路模块存在重叠。
创建时间:
2016-08-18



