Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods
收藏NIAID Data Ecosystem2026-03-06 收录
下载链接:
https://figshare.com/articles/dataset/Phylogenetic_and_Functional_Assessment_of_Orthologs_Inference___Projects_and_Methods/149057
下载链接
链接失效反馈官方服务:
资源简介:
Accurate genome-wide identification of orthologs is a central problem in
comparative genomics, a fact reflected by the numerous orthology identification
projects developed in recent years. However, only a few reports have compared
their accuracy, and indeed, several recent efforts have not yet been
systematically evaluated. Furthermore, orthology is typically only assessed in
terms of function conservation, despite the phylogeny-based original definition
of Fitch. We collected and mapped the results of nine leading orthology projects
and methods (COG, KOG, Inparanoid, OrthoMCL, Ensembl Compara, Homologene,
RoundUp, EggNOG, and OMA) and two standard methods (bidirectional best-hit and
reciprocal smallest distance). We systematically compared their predictions with
respect to both phylogeny and function, using six different tests. This required
the mapping of millions of sequences, the handling of hundreds of millions of
predicted pairs of orthologs, and the computation of tens of thousands of trees.
In phylogenetic analysis or in functional analysis where high specificity is
required, we find that OMA and Homologene perform best. At lower functional
specificity but higher coverage level, OrthoMCL outperforms Ensembl Compara, and
to a lesser extent Inparanoid. Lastly, the large coverage of the recent EggNOG
can be of interest to build broad functional grouping, but the method is not
specific enough for phylogenetic or detailed function analyses. In terms of
general methodology, we observe that the more sophisticated tree
reconstruction/reconciliation approach of Ensembl Compara was at times
outperformed by pairwise comparison approaches, even in phylogenetic tests.
Furthermore, we show that standard bidirectional best-hit often outperforms
projects with more complex algorithms. First, the present study provides
guidance for the broad community of orthology data users as to which database
best suits their needs. Second, it introduces new methodology to verify
orthology. And third, it sets performance standards for current and future
approaches.
创建时间:
2016-10-28



