SimPhy configuration scripts for simulations reported in the study titled: Species tree inference methods intended to deal with incomplete lineage sorting are robust to the presence of paralogs
收藏DataCite Commons2025-04-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.t76hdr81d
下载链接
链接失效反馈官方服务:
资源简介:
Many recent phylogenetic methods have focused on accurately inferring
species trees when there is gene tree discordance due to incomplete
lineage sorting (ILS). For almost all of these methods, and for
phylogenetic methods in general, the data for each locus is assumed to
consist of orthologous, single-copy sequences. Loci that are present in
more than a single copy in any of the studied genomes are excluded from
the data. These steps greatly reduce the number of loci available for
analysis. The question we seek to answer in this study is: What happens if
one runs such species tree inference methods on data where paralogy is
present, in addition to or without ILS being present? Through simulation
studies and analyses of two large biological data sets, we show that
running such methods on data with paralogs can still provide accurate
results. We use multiple different methods, some of which are based
directly on the multispecies coalescent (MSC) model, and some of which
have been proven to be statistically consistent under it. We also treat
the paralogous loci in multiple ways: from explicitly denoting them as
paralogs, to randomly selecting one copy per species. In all cases the
inferred species trees are as accurate as equivalent analyses using
single-copy orthologs. Our results have significant implications for the
use of ILS-aware phylogenomic analyses, demonstrating that they do not
have to be restricted to single-copy loci. This will greatly increase the
amount of data that can be used for phylogenetic inference.
提供机构:
Dryad
创建时间:
2021-07-12



