five

A Semi-Quantitative, Synteny-Based Method to Improve Functional Predictions for Hypothetical and Poorly Annotated Bacterial and Archaeal Genes

收藏
NIAID Data Ecosystem2026-03-07 收录
下载链接:
https://figshare.com/articles/dataset/A_Semi_Quantitative_Synteny_Based_Method_to_Improve_Functional_Predictions_for_Hypothetical_and_Poorly_Annotated_Bacterial_and_Archaeal_Genes/132190
下载链接
链接失效反馈
官方服务:
资源简介:
During microbial evolution, genome rearrangement increases with increasing sequence divergence. If the relationship between synteny and sequence divergence can be modeled, gene clusters in genomes of distantly related organisms exhibiting anomalous synteny can be identified and used to infer functional conservation. We applied the phylogenetic pairwise comparison method to establish and model a strong correlation between synteny and sequence divergence in all 634 available Archaeal and Bacterial genomes from the NCBI database and four newly assembled genomes of uncultivated Archaea from an acid mine drainage (AMD) community. In parallel, we established and modeled the trend between synteny and functional relatedness in the 118 genomes available in the STRING database. By combining these models, we developed a gene functional annotation method that weights evolutionary distance to estimate the probability of functional associations of syntenous proteins between genome pairs. The method was applied to the hypothetical proteins and poorly annotated genes in newly assembled acid mine drainage Archaeal genomes to add or improve gene annotations. This is the first method to assign possible functions to poorly annotated genes through quantification of the probability of gene functional relationships based on synteny at a significant evolutionary distance, and has the potential for broad application.

在微生物演化过程中,基因组重排程度随序列分歧度的升高而加剧。若能构建共线性(synteny)与序列分歧度之间的关系模型,即可识别远缘生物基因组中表现出异常共线性的基因簇,并以此推断其功能保守性。本研究采用系统发育成对比较法,针对NCBI数据库收录的全部634个古菌与细菌基因组,以及取自酸性矿山排水(acid mine drainage, AMD)菌群的4条新组装未培养古菌基因组,构建并建模了共线性与序列分歧度间的强相关性。与此同时,本研究针对STRING数据库收录的118个基因组,构建并建模了共线性与功能相关性之间的变化趋势。通过整合上述两类模型,本研究开发了一种基于进化距离加权的基因功能注释方法,用于估算基因组对之间共线性蛋白的功能关联概率。本研究将该方法应用于新组装的酸性矿山排水古菌基因组中的假设蛋白与低注释度基因,以补充或优化其基因注释信息。这是首个通过量化显著进化距离下基于共线性的基因功能关联概率,为低注释度基因赋予潜在功能的方法,具备广泛的应用前景。
创建时间:
2011-10-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作