Data from: Bias in tree searches and its consequences for measuring group supports

Mendeley Data2024-06-25 更新2024-06-29 收录

下载链接：

https://zenodo.org/records/4969514

下载链接

链接失效反馈

官方服务：

资源简介：

When doing a bootstrap analysis with a single tree saved per pseudoreplicate, biased search algorithms may influence support values more than actual properties of the data set. Two methods commonly used for finding phylogenetic trees consist of randomizing the input order of species in multiple addition sequences followed by branch swapping, or using random trees as the starting point for branch swapping. The randomness inherent to such methods is assumed to eliminate any consistent preferences for some trees or unsupported groups of taxa, but both methods can be significantly biased. In the case of trees created by sequentially adding taxa, a bias may occur even if every addition sequence is equiprobable, and if one of the equally optimal positions for each terminal to add to the tree is selected equiprobably. In the case of branch swapping, the bias can happen even when branch swapping equiprobably selects any of the trees of better score in the SPR-neighborhood or TBR-neighborhood. Consequently, when the data set is ambiguous, both random-addition sequences and branch swapping from random trees may (a) find some of the optimal trees much more frequently than others, and (b) find some groups with a frequency that differs from their frequency among all optimal trees. When the data set defines a single optimal tree, the groups present in that tree may have a different probability of being found by a search, even if supported by equal amounts of evidence. This may happen in both parsimony and maximum-likelihood analyses, and even in small data sets without incongruence.

当针对每个伪重复样本保存单棵系统发育树并开展自展分析（bootstrap analysis）时，有偏搜索算法对支持值的影响可能甚于数据集本身的实际属性。当前常用于构建系统发育树的两类方法分别为：在多步加法序列中随机化物种输入顺序后执行分支交换，或是以随机树作为分支交换的起始点。这类方法自带的随机性被认为可消除对特定类群或非支持类群的系统性偏好，但两类方法均可能存在显著偏差。就通过依次添加类群生成的系统发育树而言，即便每个加法序列的选取均为等概率，且每个待添加终端类群到系统发育树的最优位置均为等概率选取时，仍可能产生偏差。就分支交换操作而言，即便分支交换能等概率地从子树修剪-重排邻域（SPR-neighborhood）或树二分-重排邻域（TBR-neighborhood）中选取得分更高的任意树，偏差仍可能发生。因此，当数据集存在歧义时，随机加法序列与基于随机树的分支交换这两种方法均可能（a）相较于其他最优树，更频繁地找到部分最优树；（b）找到某些类群组合的频率，与这些组合在全部最优树中出现的频率存在差异。当数据集仅对应一棵最优树时，即便类群组合得到的证据支持程度一致，该树中存在的类群组合被搜索算法找到的概率仍可能有所不同。这种情况既可能出现在简约法与最大似然法分析中，即便在规模较小且无冲突信号的数据集中也可能发生。

创建时间：

2023-06-28