Development of a Chemical Structure Comparison Method for Integrated Analysis of Chemical and Genomic Information in the Metabolic Pathways
收藏NIAID Data Ecosystem2026-03-06 收录
下载链接:
https://figshare.com/articles/dataset/Development_of_a_Chemical_Structure_Comparison_Method_for_Integrated_Analysis_of_Chemical_and_Genomic_Information_in_the_Metabolic_Pathways/3652428
下载链接
链接失效反馈官方服务:
资源简介:
Cellular functions result from intricate networks of molecular interactions, which involve not only
proteins and nucleic acids but also small chemical compounds. Here we present an efficient algorithm for
comparing two chemical structures of compounds, where the chemical structure is treated as a graph
consisting of atoms as nodes and covalent bonds as edges. On the basis of the concept of functional
groups, 68 atom types (node types) are defined for carbon, nitrogen, oxygen, and other atomic species
with different environments, which has enabled detection of biochemically meaningful features. Maximal
common subgraphs of two graphs can be found by searching for maximal cliques in the association graph,
and we have introduced heuristics to accelerate the clique finding and to detect optimal local matches
(simply connected common subgraphs). Our procedure was applied to the comparison and clustering of
9383 compounds, mostly metabolic compounds, in the KEGG/LIGAND database. The largest clusters of
similar compounds were related to carbohydrates, and the clusters corresponded well to the categorization
of pathways as represented by the KEGG pathway map numbers. When each pathway map was examined
in more detail, finer clusters could be identified corresponding to subpathways or pathway modules containing
continuous sets of reaction steps. Furthermore, it was found that the pathway modules identified by similar
compound structures sometimes overlap with the pathway modules identified by genomic contexts, namely,
by operon structures of enzyme genes.
细胞功能源自错综复杂的分子相互作用网络,这类网络不仅涉及蛋白质与核酸,还包含小型化学化合物。本文提出一种高效的化合物化学结构比对算法:将化学结构建模为以原子为节点、共价键为边的图结构。基于官能团 (functional groups) 概念,针对碳、氮、氧及不同环境下的其他原子种类,定义了68种原子类型(节点类型),借此可检测具有生物化学意义的结构特征。通过在关联图 (association graph) 中搜索极大团 (maximal cliques),可求解两个图的极大公共子图;本文引入启发式方法 (heuristics) 以加速团搜索过程,并检测最优局部匹配(即简单连通公共子图)。我们将所提流程应用于KEGG/LIGAND数据库中9383种化合物(多数为代谢化合物)的比对与聚类分析。相似化合物构成的最大聚类与碳水化合物相关,且聚类结果与以KEGG通路图谱编号表征的通路分类高度吻合。当对每条通路图谱进行细致分析时,可进一步识别出更精细的聚类,这些聚类对应包含连续反应步骤集合的子通路或通路模块。此外,研究发现基于相似化合物结构识别的通路模块,有时与基于基因组背景(即酶基因的操纵子结构)识别的通路模块存在重叠。
创建时间:
2016-08-18



