Data from: Species tree estimation and the impact of gene loss following whole-genome duplication
收藏DataCite Commons2026-03-13 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.prr4xgxmr
下载链接
链接失效反馈官方服务:
资源简介:
Whole-genome duplication (WGD) has been demonstrated to occur broadly and
repeatedly in the evolutionary history of eukaryotes, and is recognized as
a prominent evolutionary force, especially in plants. Immediately
following WGD, most genes are present in two copies as paralogs. Due to
this redundancy, one copy of a paralog pair commonly undergoes
pseudogenization and is eventually lost. When speciation occurs shortly
after WGD, however, differential loss of paralogs may lead to spurious
phylogenetic inference resulting from the inclusion of pseudoorthologs –
paralogous genes mistakenly identify as orthologs because they are present
in single copes within each sampled species. The influence and impact of
including pseudoorthologs versus true orthologs as result of gene
extinction (or incomplete laboratory sampling) in a phylogenetic context
is only recently starting to gain empirical attention. Moreover, few of
these studies have yet to investigate this phenomenon in an explicit
coalescent framework. Here, using mathematical models, numerous simulated
data sets, and two newly assembled empirical data sets, we assess the
effect of pseudoorthologs on species tree estimation under varying levels
of incomplete lineage sorting (ILS) and different patterns of gene loss
following WGD. When gene loss occurs in the terminal branches of the
species tree, the alignment-based (BPP) and gene-tree-based (ASTRAL,
MP-EST, and STAR) coalescent methods are adversely affected as the level
of ILS increases. This can be greatly improved by sampling a sufficiently
large number of genes. Under the same circumstances, however,
concatenation methods consistently estimate incorrect species trees as the
number of sampled genes increases. Furthermore, pseudoorthologs can
mislead species tree inference if gene loss occurs in the internal
branches of the species tree, where both coalescent and concatenation
methods are prone to produce inconsistent results. However,
pseudoorthologs are problematic when filtering only for single-copy genes
in phylogenomic data sets. Pruning orthologs or even randomly selecting a
copy from multi-copy genes can avoid most of those pseudoorthologs. These
results underscore the importance of understanding the influence of
pseudoorthologs in the phylogenomics era.
提供机构:
Dryad
创建时间:
2022-06-15



