Supplementary Figure 2
收藏DataONE2016-10-11 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
The impact of missing data on quartet informativeness for simulated data (a-f) and the empirical Viburnum data set (g). This is an extension of Fig. 1. Data were simulated on three topologies, a balanced tree, an imbalanced tree, and the Viburnum topology with branch lengths scaled by penalized likelihood, and the outgroup removed. The number of loci that are quartet informative for each split is shown under each tree. In the absence of missing data all 1,000 simulated loci are informative about every edge (a-f; black circles). Under mutation-disruption (a-c) quartet information is lost faster in double-digest data (light grey) than in single-digest data (dark grey), and its effect varies depending on tree shape (see description in Fig. 1). Data simulated at low sequencing coverage (d-f) had either 50% (dark grey) or 80% (light grey)
of data randomly missing. Here the effect of tree shape is more pronounced. Nearly all information is recovered across the deepest splits in the balanced topology (d) due to its hierarchical redundancy, but no data is recovered in the imbalanced topology (e) which does not increase in hierarchical redundancy across deeper edges. The empiricalViburnum topology is relatively balanced, and data simulated on this topology (c, f) appears similar to that simulated on the balanced topology (a, d). The true distribution of quartet informativeness recovered in the Viburnum RAD-seq data set (g) is similar to the expectation when data were simulated on this topology under low sequencing coverage (f).
缺失数据对模拟数据(a-f)与实际荚蒾(Viburnum)数据集(g)的四重态信息性(quartet informativeness)的影响。本图为图1的拓展版本。研究基于三类拓扑结构生成模拟数据:平衡树、非平衡树,以及经惩罚似然法校准分支长度的荚蒾拓扑结构,并移除了外类群。每棵树下方标注了对应分枝具有四重态信息性的基因座(loci)数量。在无缺失数据的场景下,全部1000个模拟基因座均可为所有分支提供信息(a-f图以黑色圆点表示)。在突变干扰条件下(a-c),双酶切数据(浅灰色)的四重态信息丢失速度快于单酶切数据(深灰色),且其影响程度随树结构而异(详见图1说明)。低测序覆盖度下生成的模拟数据(d-f)存在50%(深灰色)或80%(浅灰色)的随机缺失数据,此场景下树结构的影响更为显著:由于平衡拓扑结构具有层级冗余性,其最深层分枝的几乎所有信息均可被保留(d图);而非平衡拓扑结构的深层分支并无层级冗余性,因此未保留任何信息(e图)。实际荚蒾数据集的拓扑结构相对平衡,基于该拓扑生成的模拟数据(c、f图)与基于平衡拓扑生成的模拟数据(a、d图)表现相似。从荚蒾限制性位点相关DNA测序(RAD-seq)数据集(g图)中恢复得到的四重态信息性真实分布,与该拓扑结构在低测序覆盖度下的模拟结果预期(f图)高度相似。
创建时间:
2016-10-11



