Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi grouping
收藏PubMed Central2000-10-15 更新2026-05-16 收录
下载链接:
https://pmc.ncbi.nlm.nih.gov/articles/PMC110780/
下载链接
链接失效反馈官方服务:
资源简介:
We previously reported two graph algorithms for analysis of genomic information: a graph comparison algorithm to detect locally similar regions called correlated clusters and an algorithm to find a graph feature called P-quasi complete linkage. Based on these algorithms we have developed an automatic procedure to detect conserved gene clusters and align orthologous gene orders in multiple genomes. In the first step, the graph comparison is applied to pairwise genome comparisons, where the genome is considered as a one-dimensionally connected graph with genes as its nodes, and correlated clusters of genes that share sequence similarities are identified. In the next step, the P-quasi complete linkage analysis is applied to grouping of related clusters and conserved gene clusters in multiple genomes are identified. In the last step, orthologous relations of genes are established among each conserved cluster. We analyzed 17 completely sequenced microbial genomes and obtained 2313 clusters when the completeness parameter P was 40%. About one quarter contained at least two genes that appeared in the metabolic and regulatory pathways in the KEGG database. This collection of conserved gene clusters is used to refine and augment ortholog group tables in KEGG and also to define ortholog identifiers as an extension of EC numbers.
我们此前报道了两种用于基因组信息分析的图算法(graph algorithms):一款为图比对算法(graph comparison algorithm),用于检测被称为关联簇(correlated clusters)的局部相似区域;另一款为用于挖掘P-拟完全连锁(P-quasi complete linkage)这一图特征的算法。基于上述算法,我们开发了一套自动化分析流程,可用于检测保守基因簇(conserved gene clusters)并对多基因组中的直系同源基因排布进行比对。第一步,将图比对算法应用于全基因组两两比对:此时将基因组视为以基因为节点(nodes)的一维连通图,可识别出具有序列相似性的基因关联簇。第二步,采用P-拟完全连锁分析对相关簇进行聚类,从而识别出多基因组中的保守基因簇。最后一步,为每个保守基因簇内的基因建立直系同源关联。我们对17个已完成测序的微生物基因组(microbial genomes)进行了分析,当完备性参数P取值为40%时,共得到2313个基因簇。其中约四分之一的簇包含至少两个收录于KEGG数据库的代谢与调控通路相关基因。该保守基因簇集合可用于优化并扩充KEGG数据库中的直系同源基因族表,同时还可作为EC(Enzyme Commission)编号的扩展,用于定义直系同源基因标识符。
提供机构:
Oxford University Press
创建时间:
2000-10-15



