five

ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes

收藏
DataONE2023-06-08 更新2025-08-09 收录
下载链接:
https://search.dataone.org/view/sha256:0b630ecadf4fd87b00400f9bd183e3ee49d8923dd7bdd8e17d6862c453534f11
下载链接
链接失效反馈
官方服务:
资源简介:
Motivation: The estimation of species phylogenies requires multiple loci, since different loci can have different trees due to incomplete lineage sorting, modeled by the multi-species coalescent model. We recently developed a coalescent-based method, ASTRAL, which is statistically consistent under the multi-species coalescent model and which is more accurate than other coalescent-based methods on the datasets we examined. ASTRAL runs in polynomial time, by constraining the search space using a set of allowed ‘bipartitions’. Despite the limitation to allowed bipartitions, ASTRAL is statistically consistent. Results: We present a new version of ASTRAL, which we call ASTRAL-II. We show that ASTRAL-II has substantial advantages over ASTRAL: it is faster, can analyze much larger datasets (up to 1000 species and 1000 genes) and has substantially better accuracy under some conditions. ASTRAL’s running time is O(n^2k|X|^2), and ASTRAL-II’s running time is O(nk|X|^2), where n is the number of sp..., We used SimPhy (https://github.com/adamallo/SimPhy) to simulate species trees and gene trees and used Indelible (Fletcher and Yang, 2009) to simulate nucleotide sequences down the gene trees with varying length and model parameters. We estimated gene trees on these simulated gene alignments, which we then used in coalescent-based analyses. We simulated 11 model conditions, which we divide into two datasets, with one model condition appearing in both datasets. We used SimPhy to simulate species trees according to the Yule process, characterized by the number of taxa, maximum tree length, and the speciation rate (this combination defines a model condition). Dataset 1: In six model conditions, we fixed the number of taxa to 200 and varied tree length (500 K, 2 M and 10 M generations) and speciation rates (1e-6 and 1e-7 per generation). The tree length impacts the amount of ILS, with lower length resulting in shorter branches, and therefore higher levels of ILS. Speciation rate impacts whet...,

研究背景:物种系统发育树的推断需依赖多个基因座,因不同基因座会因不完全谱系分选(incomplete lineage sorting, ILS)产生各异的基因树,此类现象可通过多物种溯祖模型(multi-species coalescent model)进行建模。我们此前开发了一款基于溯祖理论的方法ASTRAL,该方法在多物种溯祖模型下具备统计一致性,且在我们测试的数据集上相较其他同类溯祖方法拥有更高的推断精度。ASTRAL通过一组受限的“二分划分(bipartitions)”约束搜索空间,因此可在多项式时间内完成运算。尽管仅支持合法二分划分的搜索空间约束,ASTRAL仍具备统计一致性。 研究结果:本文提出ASTRAL的新版本,命名为ASTRAL-II。我们证明ASTRAL-II相较ASTRAL具备显著优势:运算速度更快,可处理规模更大的数据集(支持最多1000个物种与1000个基因),且在部分场景下具备更优异的推断精度。ASTRAL的运行时间复杂度为O(n²k|X|²),而ASTRAL-II的运行时间复杂度为O(nk|X|²),其中n为物种数量……我们使用SimPhy(https://github.com/adamallo/SimPhy)模拟物种树与基因树,并借助Indelible(Fletcher与Yang,2009)工具,基于基因树模拟不同长度与模型参数的核苷酸序列。我们针对模拟得到的基因联配序列进行基因树推断,将其用于后续的溯祖分析。 我们共设置11种模型条件,将其划分为两个数据集,其中1种模型条件同时出现在两个数据集内。我们通过Yule过程生成模拟物种树,模型条件由类群数量、最大树长与物种形成速率共同定义。 数据集1:在6种模型条件中,我们固定类群数量为200,调整树长(500K、2M与10M代)与物种形成速率(每代1e-6与1e-7)。树长会影响不完全谱系分选(ILS)的程度:树长越短,分支长度越短,ILS水平越高。物种形成速率则影响……
创建时间:
2025-07-22
二维码
社区交流群
二维码
科研交流群
商业服务