Data from: The relative importance of modeling site pattern heterogeneity versus partition-wise heterotachy in phylogenomic inference
收藏DataCite Commons2025-05-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.68qt629
下载链接
链接失效反馈官方服务:
资源简介:
Large taxa-rich genome-scale data sets are often necessary for resolving
ancient phylogenetic relationships. But accurate phylogenetic inference
requires that they are analyzed with realistic models that account for the
heterogeneity in substitution patterns amongst the sites, genes and
lineages. Two kinds of adjustments are frequently used: models that
account for heterogeneity in amino acid frequencies at sites in proteins,
and partitioned models that accommodate the heterogeneity in rates (branch
lengths) among different proteins in different lineages (protein-wise
heterotachy). Although partitioned and site-heterogeneous models are both
widely used in isolation, their relative importance to the inference of
correct phylogenies has not been carefully evaluated. We conducted several
empirical analyses and a large set of simulations to compare the relative
performances of partitioned models, site-heterogeneous models and combined
partitioned site heterogeneous models. In general, site-homogeneous models
(partitioned or not) performed worse than site heterogeneous, except in
simulations with extreme protein-wise heterotachy. Furthermore,
simulations using empirically-derived realistic parameter settings showed
a marked long-branch attraction (LBA) problem for analyses employing
protein-wise partitioning even when the generating model included
partitioning. This LBA problem results from a small sample bias compounded
over many single protein alignments. In some cases, this problem was
ameliorated by clustering similarly-evolving proteins together into larger
partitions using the PartitionFinder method. Similar results were obtained
under simulations with larger numbers of taxa or heterogeneity in
simulating topologies over genes. For an empirical Microsporidia test data
set, all but one tested site-heterogeneous models (with or without
partitioning) obtain the correct Microsporidia+Fungi grouping, whereas
site-homogenous models (with or without partitioning) did not. The single
exception was the fully partitioned site-heterogeneous analysis that
succumbed to the compounded small sample LBA bias. In general unless
protein-wise heterotachy effects are extreme, it is more important to
model site-heterogeneity than protein-wise heterotachy in phylogenomic
analyses. Complete protein-wise partitioning should be avoided as it can
lead to a serious LBA bias. In cases of extreme protein-wise heterotachy,
approaches that cluster similarly-evolving proteins together and coupled
with site-heterogeneous models work well for phylogenetic estimation.
提供机构:
Dryad
创建时间:
2019-04-12



