five

A Semi-Quantitative, Synteny-Based Method to Improve Functional Predictions for Hypothetical and Poorly Annotated Bacterial and Archaeal Genes

收藏
Figshare2016-01-18 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/A_Semi_Quantitative_Synteny_Based_Method_to_Improve_Functional_Predictions_for_Hypothetical_and_Poorly_Annotated_Bacterial_and_Archaeal_Genes/132190
下载链接
链接失效反馈
官方服务:
资源简介:
During microbial evolution, genome rearrangement increases with increasing sequence divergence. If the relationship between synteny and sequence divergence can be modeled, gene clusters in genomes of distantly related organisms exhibiting anomalous synteny can be identified and used to infer functional conservation. We applied the phylogenetic pairwise comparison method to establish and model a strong correlation between synteny and sequence divergence in all 634 available Archaeal and Bacterial genomes from the NCBI database and four newly assembled genomes of uncultivated Archaea from an acid mine drainage (AMD) community. In parallel, we established and modeled the trend between synteny and functional relatedness in the 118 genomes available in the STRING database. By combining these models, we developed a gene functional annotation method that weights evolutionary distance to estimate the probability of functional associations of syntenous proteins between genome pairs. The method was applied to the hypothetical proteins and poorly annotated genes in newly assembled acid mine drainage Archaeal genomes to add or improve gene annotations. This is the first method to assign possible functions to poorly annotated genes through quantification of the probability of gene functional relationships based on synteny at a significant evolutionary distance, and has the potential for broad application.

在微生物演化过程中,基因组重排程度随序列分化程度的升高而加剧。若能构建同线性(synteny)与序列分化程度之间的关联模型,则可识别亲缘关系较远的生物体基因组中呈现异常同线性的基因簇,并借此推断其功能保守性。本研究采用系统发育成对比较法,针对NCBI数据库收录的全部634个古菌与细菌基因组,以及某酸性矿坑排水(AMD)群落中4条新组装的未培养古菌基因组,构建并建模了同线性与序列分化程度间的强相关性。与此同时,本研究针对STRING数据库收录的118个基因组,构建并建模了同线性与功能相关性间的变化趋势。通过整合上述两类模型,本研究开发了一种基因功能注释方法:该方法通过加权演化距离,估算基因组对间同线性蛋白质的功能关联概率。本研究将该方法应用于新组装的酸性矿坑排水古菌基因组中的假设蛋白与低注释度基因,以新增或优化其基因注释信息。本方法是首个基于同线性、通过量化演化距离显著的基因功能关联概率,为低注释度基因赋予潜在功能的工具,具备广阔的应用前景。
创建时间:
2016-01-18
二维码
社区交流群
二维码
科研交流群
商业服务