Data from: How well can we detect lineage-specific diversification-rate shifts? A simulation study of sequential AIC methods

DataONE2016-03-30 更新2024-06-26 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

Evolutionary biologists have long been fascinated by the extreme differences in species numbers across branches of the Tree of Life. This has motivated the development of statistical methods for detecting shifts in the rate of lineage diversification across the branches of phylogenic trees. One of the most frequently used methods, MEDUSA, explores a set of diversification-rate models, where each model assigns branches of the phylogeny to a set of diversification-rate categories. Each model is first fit to the data, and the Akaike Information Criterion (AIC) is then used to identify the optimal diversification model. Surprisingly, the statistical behavior of this popular method is uncharacterized, which is a concern in light of: (1) the poor performance of the AIC as a means of choosing among models in other phylogenetic contexts; (2) the ad hoc algorithm used to visit diversification models, and; (3) errors that we reveal in the likelihood function used to fit diversification models to the phylogenetic data. Here, we perform an extensive simulation study demonstrating that MEDUSA (1) has a high false-discovery rate (on average, spurious diversification-rate shifts are identified ≈30% of the time), and (2) provides biased estimates of diversification-rate parameters. Understanding the statistical behavior of MEDUSA is critical both to empirical researchers—in order to clarify whether these methods can make reliable inferences from empirical datasets—and to theoretical biologists—in order to clarify the specific problems that need to be solved in order to develop more reliable approaches for detecting shifts in the rate of lineage diversification.

演化生物学家长久以来一直对生命之树（Tree of Life）各分支间物种类群数量的极端差异抱有浓厚研究兴趣。这一研究议题推动了用于检测系统发育树（phylogenetic tree）各分支内谱系分化（lineage diversification）速率变化的统计方法的发展。当前最常用的方法之一MEDUSA，会对一系列分化速率模型展开探索：每个模型将系统发育树的分支划分至若干分化速率类别中。首先将每个模型适配至数据集，随后借助赤池信息准则（Akaike Information Criterion, AIC）筛选最优分化模型。令人意外的是，这一常用方法的统计特性尚未被阐明，而这一现状值得关注，原因在于以下三点：(1) 在其他系统发育研究场景中，AIC用于模型选择时表现不佳；(2) 该方法采用了用于遍历分化模型的特设算法（ad hoc algorithm）；(3) 我们在适配系统发育数据的分化模型似然函数（likelihood function）中发现了若干错误。本研究开展了一项大规模模拟实验，结果表明：(1) MEDUSA的假发现率（false-discovery rate）较高——平均而言，约30%的检测结果为虚假的分化速率变化；(2) 该方法对分化速率参数的估计存在偏倚。阐明MEDUSA的统计特性，对于实证研究人员与理论生物学家均至关重要：前者可借此明确此类方法能否从实证数据集中得出可靠推断，后者则可借此明确开发更可靠的谱系分化速率变化检测方法时需要解决的具体问题。

创建时间：

2016-03-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集