five

Data from: Why concatenation fails near the anomaly zone

收藏
DataONE2017-07-03 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Genome-scale sequencing has been of great benefit in recovering species trees, but has not provided final answers. Despite the rapid accumulation of molecular sequences, resolving short and deep branches of the tree of life has remained a challenge, and has prompted the development of new strategies that can make the best use of available data. One such strategy – the concatenation of gene alignments – can be successful when coupled with many tree estimation methods, but has also been shown to fail when there are high levels of incomplete lineage sorting. Here, we focus on the failure of likelihood-based methods in retrieving a rooted, asymmetric four-taxon species tree from concatenated data when the species tree is in or near the anomaly zone – a region of parameter space where the most common gene tree does not match the species tree because of incomplete lineage sorting. First, we use coalescent theory to prove that most informative sites will support the species tree in the anomaly zone, and that as a consequence maximum-parsimony succeeds in recovering the species tree from concatenated data. We further show that maximum-likelihood tree estimation from concatenated data fails both inside and outside the anomaly zone, and that this failure cannot be easily predicted from the topology of the most common gene tree. We show that likelihood-based methods often fail in a region partially overlapping the anomaly zone, likely because of the lower relative cost of substitutions on discordant gene tree branches that are absent from the species tree. Our results confirm and extend previous reports on the performance of these methods applied to concatenated data from a rooted, asymmetric four-taxon species tree, and highlight avenues for future work improving the performance of methods aimed at recovering species tree.

全基因组测序在物种树重建领域已展现出巨大价值,但并未给出终极答案。尽管分子序列数据积累迅猛,但厘清生命之树的短深分支始终是一项挑战,这也推动了可充分利用现有数据的新型策略的发展。其中一类策略——基因序列比对拼接(concatenation of gene alignments)——在搭配多种树结构推断方法时可取得成功,但在不完全谱系分选(incomplete lineage sorting)水平较高的场景下却会失效。本研究聚焦于:当物种树处于异常区(anomaly zone)或邻近异常区时,基于似然法的推断方法无法从拼接数据中重建出带根的不对称四分类群物种树——异常区指因不完全谱系分选导致最常见基因树与物种树不一致的参数空间区域。首先,我们借助溯祖理论(coalescent theory)证明:在异常区内,绝大多数信息位点将支持物种树,因此最大简约法(maximum-parsimony)可从拼接数据中成功重建物种树。我们进一步证实,基于拼接数据的最大似然树推断无论在异常区内还是区外均会失效,且该失效现象难以通过最常见基因树的拓扑结构进行预测。我们发现,基于似然法的推断常会在与异常区部分重叠的区域内失效,这一现象可能源于:与物种树中不存在的、与物种树不一致的基因树分支上的替换事件相比,其相对替换成本更低。本研究结果验证并拓展了此前针对带根不对称四分类群物种树的拼接数据开展的相关方法性能研究的报道,并为未来优化物种树重建方法的性能指明了研究方向。
创建时间:
2017-07-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作