Data from: Assessing parameter identifiability in phylogenetic models using Data Cloning

DataONE2012-06-28 更新2024-06-27 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

The success of model-based methods in phylogenetics has motivated much research aimed at generating new, biologically informative models. This new computer-intensive approaches to phylogenetics demands validation studies and sound measures of performance. To date such work has consisted only of simulation studies, estimation of known phylogenies and difficult mathematical analyses assessing the estimability of parameters. Little practical guidance has been available to practitioners and theoreticians alike as to when and why the parameters in a particular model can be identified reliably. Here, we illustrate how Data Cloning (DC), a recently developed methodology to compute the Maximum Likelihood estimates along with their asymptotic variance, can be used to diagnose structural parameter non-identifiability (NI) and distinguish it from other parameter estimability problems including the case where parameters are structurally identifiable, but are not estimable in given data set (INE), and the case where parameters are identifiable, and estimable, but only weakly so (WE). The application of the DC theorem uses well-known and widely used Bayesian computational techniques. With the DC approach, practitioners can use any Bayesian phylogenetics software to be able to diagnose non-identifiability. Theoreticians and practitioners alike now have a powerful tool to detect non-identifiability while investigating complex modeling scenarios, where getting closed-form expressions in a probabilistic study is complicated. Furthermore, here we also show how DC can be used as a tool to examine and eliminate the influence of the priors, in particular if the process of prior elicitation is not straightforward. Finally, when applied to phylogenetic inference, DC can be used to study at least two important statistical questions: assessing identifiability of discrete parameters, like the tree topology, and developing efficient sampling methods for computationally expensive posterior densities.

基于模型的方法在系统发育学（phylogenetics）中的成功，推动了大量旨在构建兼具生物学信息价值的新型模型的研究。这类计算密集型的系统发育学新方法，亟需配套的验证研究与可靠的性能评估准则。迄今为止，相关研究仅涵盖仿真实验、已知系统发育树的参数估计，以及用于评估参数可识别性的复杂数学分析。尚无实用指南可为从业者与理论研究者提供参考，以判断特定模型中的参数在何时、因何条件下能够被可靠识别。本文阐述了数据克隆（Data Cloning, DC）——一种新近提出的、可计算最大似然估计（Maximum Likelihood estimates）及其渐近方差的方法——如何被用于诊断结构参数不可识别性（structural parameter non-identifiability, NI），并将其与其他参数可估计性问题相区分：包括参数在结构上可识别但在给定数据集下无法估计的情形（INE），以及参数可识别且可估计但估计效果较弱的情形（WE）。数据克隆定理的应用依托于学界熟知且广泛使用的贝叶斯计算技术。借助数据克隆方法，从业者可通过任意一款贝叶斯系统发育学软件完成不可识别性诊断。如今，理论研究者与从业者均拥有了一款强大工具，可在探究复杂建模场景时检测不可识别性——此类场景下，在概率研究中获取闭式表达式往往极为困难。此外，本文还展示了数据克隆如何被用作检验并消除先验影响的工具，尤其当先验信息征询（prior elicitation）流程较为繁琐时。最后，将数据克隆方法应用于系统发育推断时，还可用于研究至少两个关键统计学问题：评估树拓扑结构等离散参数的可识别性，以及为计算成本高昂的后验密度开发高效采样方法。

创建时间：

2012-06-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集