Data from: Experimental design in phylogenetics: testing predictions from expected information
收藏DataONE2012-02-28 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Taxon and character sampling is central to phylogenetic experimental design yet we lack general rules. Goldman introduced a method to construct efficient sampling designs in phylogenetics, based on the calculation of expected Fisher information given a probabilistic model of sequence evolution. The considerable potential of this approach remains largely unexplored. In an earlier study, we applied Goldman’s method to a problem in the phylogenetics of caecilian amphibians and made an a priori evaluation and testable predictions of which taxon additions would increase information about a particular weakly supported branch of the caecilian phylogeny by the greatest amount. Using mitogenomic and rag1 sequences (some newly determined for this study) from additional caecilian species we studied how information (both expected and observed) and bootstrap support varies as each new taxon is individually added, providing the first empirical test of specific predictions made using Goldman’s method for phylogenetic experimental design. Our results empirically validate the top three (more intuitive) taxon addition predictions made in our previous study, but only information results validate unambiguously the fourth (less intuitive) prediction. This highlights a complex relationship between information and support, reflecting that each measures different things: information is related to the ability to estimate branch length accurately, and support to the ability to estimate the tree topology accurately. Thus, an increase in information may be correlated with but does not necessitate an increase in support Our results also provide the first empirical validation of the widely held intuition that additional taxa that join the tree proximal to poorly supported internal branches are more informative and enhance support more than additional taxa that join the tree more distally. Our work supports the view that adding more data for a single (well chosen) taxon may increase phylogenetic resolution and support in weakly supported parts of the tree without adding more characters/genes while illustrating that less well chosen taxon additions can have the opposite effect. Altogether our results corroborate that, although still underexplored, Goldman’s method offers a powerful tool for experimental design in molecular phylogenetic studies. However, there are still several drawbacks to overcome, and further assessment of the method is needed in order to make it better understood, more accessible, and able to assess additions of multiple taxa.
分类群(taxon)与特征取样是系统发育实验设计的核心环节,但目前尚无通用规则可循。戈德曼(Goldman)提出了一种基于序列演化概率模型下期望费希尔信息(expected Fisher information)计算的系统发育高效取样设计方法。该方法的巨大潜力迄今仍未得到充分挖掘。在前期研究中,我们将戈德曼的方法应用于蚓螈类两栖动物的系统发育问题,并完成了先验评估与可验证预测:明确哪些分类群的加入可最大程度提升针对蚓螈系统发育中某一支支持度较弱分支的信息含量。本研究通过新增蚓螈物种的线粒体基因组(mitogenomic)与rag1基因序列(部分为本研究新测定),探究了逐个加入新分类群时,信息含量(包括期望信息与实测信息)与自举支持值(bootstrap support)的变化情况,首次基于戈德曼的系统发育实验设计方法对特定预测开展了实证检验。研究结果实证验证了前期研究中提出的前三项(更符合直觉的)分类群加入预测,但仅信息含量结果可明确验证第四项(较不符合直觉的)预测。这凸显了信息含量与支持值之间的复杂关联,反映出二者的衡量维度存在差异:信息含量与分支长度的准确估计能力相关,而支持值则与系统发育树拓扑结构的准确估计能力相关。因此,信息含量的提升可能与支持值的增加存在关联,但并非必然导致支持值上升。本研究结果还首次实证验证了学界广泛认可的直觉:相较于在系统发育树更远端加入的分类群,在支持度较低的内部分支附近加入的分类群能带来更高的信息含量,并更显著地提升支持值。本研究支持如下观点:在不新增特征/基因的前提下,为单个(筛选得当的)分类群补充数据,可提升系统发育树支持度较弱区域的分辨率与支持值;同时也表明,筛选不当的分类群加入反而可能产生相反效果。综上,本研究结果证实:尽管戈德曼的方法仍未得到充分探索,但其为分子系统发育研究的实验设计提供了强有力的工具。不过该方法仍存在若干有待克服的缺陷,未来需开展进一步评估,以提升其可理解性与易用性,并使其能够实现多分类群加入情况的评估。
创建时间:
2012-02-28



