Data from: Why do phylogenomic data sets yield conflicting trees? Data type influences the avian tree of life more than taxon sampling
收藏DataCite Commons2025-05-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.6536v
下载链接
链接失效反馈官方服务:
资源简介:
Phylogenomics, the use of large-scale data matrices in phylogenetic
analyses, has been viewed as the ultimate solution to the problem of
resolving difficult nodes in the tree of life. However, it has become
clear that analyses of these large genomic data sets can also result in
conflicting estimates of phylogeny. Here, we use the early divergences in
Neoaves, the largest clade of extant birds, as a “model system” to
understand the basis for incongruence among phylogenomic trees. We were
motivated by the observation that trees from two recent avian phylogenomic
studies exhibit conflicts. Those studies used different strategies: 1)
collecting many characters [42 mega base pairs (Mbp) of sequence data]
from 48 birds, sometimes including only one taxon for each major clade;
and 2) collecting fewer characters (0.4 Mbp) from 198 birds, selected to
subdivide long branches. However, the studies also used different data
types: the taxon-poor data matrix comprised 68% non-coding sequences
whereas coding exons dominated the taxon-rich data matrix. This difference
raises the question of whether the primary reason for incongruence is the
number of sites, the number of taxa, or the data type. To test among these
alternative hypotheses we assembled a novel, large-scale data matrix
comprising 90% non-coding sequences from 235 bird species. Although
increased taxon sampling appeared to have a positive impact on
phylogenetic analyses the most important variable was data type. Indeed,
by analyzing different subsets of the taxa in our data matrix we found
that increased taxon sampling actually resulted in increased congruence
with the tree from the previous taxon-poor study (which had a majority of
non-coding data) instead of the taxon-rich study (which largely used
coding data). We suggest that the observed differences in the estimates of
topology for these studies reflect data-type effects due to violations of
the models used in phylogenetic analyses, some of which may be difficult
to detect. If incongruence among trees estimated using phylogenomic
methods largely reflects problems with model fit developing more
“biologically-realistic” models is likely to be critical for efforts to
reconstruct the tree of life.
提供机构:
Dryad
创建时间:
2017-03-23



