five

Data from: Cophylogeny Reconstruction via an Approximate Bayesian Computation

收藏
DataONE2014-12-26 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Despite an increasingly vast literature on cophylogenetic reconstructions for studying host-parasite associations, understanding the common evolutionary history of such systems remains a problem that is far from being solved. Most algorithms for host-parasite reconciliation use an event-based model, where the events include in general (a subset of) cospeciation, duplication, loss, and host switch. All known parsimonious event-based methods then assign a cost to each type of event in order to find a reconstruction of minimum cost. The main problem with this approach is that the cost of the events strongly influences the reconciliation obtained. Some earlier approaches attempt to avoid this problem by finding a Pareto set of solutions and hence by considering event costs under some minimisation constraints. To deal with this problem, we developed an algorithm, called \Coala, for estimating the frequency of the events based on an approximate Bayesian computation approach. The benefits of this method are twofold: (1) it provides more confidence in the set of costs to be used in a reconciliation, and (2) it allows estimation of the frequency of the events in cases where the dataset consists of trees with a large number of taxa. We evaluate our method on simulated and on biological datasets. We show that in both cases, for the same pair of host and parasite trees, different sets of frequencies for the events lead to equally probable solutions. Moreover, often these solutions differ greatly in terms of the number of inferred events. It appears crucial to take this into account before attempting any further biological interpretation of such reconciliations. More generally, we also show that the set of frequencies can vary widely depending on the input host and parasite trees. Indiscriminately applying a standard vector of costs may thus not be a good strategy.

尽管针对宿主-寄生虫关联(host-parasite associations)的共系统发育重建(cophylogenetic reconstructions)研究文献日益丰富,但理解这类系统的共同演化历史仍是一个远未解决的难题。大多数宿主-寄生虫共祖重建算法采用基于事件的模型,这类模型的事件通常包含(子集形式的)共物种形成(cospeciation)、复制(duplication)、丢失(loss)以及宿主转换(host switch)。目前所有已知的简约事件型方法(parsimonious event-based methods)都会为每类事件分配成本,以此寻找总成本最小的重建方案。该方法的核心局限在于,事件成本的设定会显著影响最终得到的共祖重建结果。早期部分研究尝试通过求解帕累托最优解集(Pareto set of solutions),或在特定最小化约束下考量事件成本,以规避这一问题。 为解决这一问题,我们开发了一款名为Coala的算法,该算法基于近似贝叶斯计算(approximate Bayesian computation)框架估算各类事件的发生频率。本方法具备双重优势:其一,可为共祖重建中所使用的事件成本集合提供更高的置信度;其二,当数据集包含大量分类单元(taxa)的系统发育树时,仍可实现事件频率的估算。 我们分别在模拟数据集与真实生物数据集上对所提方法进行了评估。实验结果表明,在两种数据集场景下,针对同一对宿主与寄生虫系统发育树,不同的事件频率集合均可得到概率相当的重建解。此外,这类解在推断出的事件数量上往往存在显著差异。因此,在对这类共祖重建结果开展后续生物学解读前,充分考量这一特性至关重要。更广泛地说,我们还证实,事件频率集合会因输入的宿主与寄生虫系统发育树的不同而产生大幅波动。由此可见,盲目套用标准化的事件成本向量并非合理的研究策略。
创建时间:
2014-12-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作