Data from: Information criteria for comparing partition schemes
收藏DataONE2017-12-21 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
When inferring phylogenies, one important decision is whether and how nucleotide substitution parameters should be shared across different subsets or partitions of the data. One sort of partitioning error occurs when heterogeneous subsets are mistakenly lumped together and treated as if they share parameter values. The opposite kind of error is mistakenly treating homogeneous subsets as if they result from distinct sets of parameters. Lumping and splitting errors are not equally bad. Lumping errors can yield parameter estimates that do not accurately reject any of the subsets that were combined whereas splitting errors yield estimates that did not benefit from sharing information across partitions. Phylogenetic partitioning decisions are often made by applying information criteria such as the Akaike Information Criterion (AIC). As with other information criteria, the AIC evaluates a model or partition scheme by combining the maximum log-likelihood value with a penalty that depends on the number of parameters being estimated. For the purpose of selecting an optimal partitioning scheme, we derive an adjustment to the AIC that we refer to as the AICP and that is motivated by the idea that splitting errors are less serious than lumping errors. We also introduce a similar adjustment to the Bayesian Information Criterion (BIC) that we refer to as the BICP. Via simulation and empirical data analysis, we contrast AIC and BIC behavior to our suggested adjustments. We discuss these results and also emphasize why we expect the probability of lumping errors with the AICP and the BICP to be relatively robust to model parameterization.
在系统发育(phylogeny)推断过程中,一项关键决策在于是否需要、以及应当如何在数据集的不同子集或分区(partition)间共享核苷酸替换参数。一类分区错误出现在将异质性子集错误合并,并视作共享参数值的场景中;与之相对的另一类错误,则是将同质性子集错误地认定为源自不同的参数集。合并错误与拆分错误的危害程度并不对等:合并错误会导致参数估计无法准确拒绝被合并的所有子集,而拆分错误则会让参数估计无法获益于分区间的信息共享。系统发育分区决策通常借助信息准则完成,例如赤池信息准则(Akaike Information Criterion, AIC)。与其他信息准则一致,AIC通过结合最大对数似然值与依赖于待估参数数量的惩罚项,来评估模型或分区方案。为了筛选最优分区方案,我们对AIC进行了调整,将其命名为AICP,该调整的设计初衷源于拆分错误的危害轻于合并错误这一理念。我们还对贝叶斯信息准则(Bayesian Information Criterion, BIC)提出了类似调整,将其命名为BICP。我们通过模拟实验与实证数据分析,对比了AIC、BIC与我们提出的调整后准则的表现。最后我们对上述结果展开讨论,并阐释了为何我们认为AICP与BICP下的合并错误概率,相对而言对模型参数化过程具备较强的稳健性。
创建时间:
2017-12-21



