five

Data from: Inferring heterogeneous evolutionary processes through time: from sequence substitution to phylogeography

收藏
DataONE2014-02-28 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Molecular phylogenetic and phylogeographic reconstructions generally assume time-homogeneous substitution processes. Motivated by computational convenience, this assumption sacrifices biological realism and offers little opportunity to uncover the temporal dynamics in evolutionary histories. Here, we propose an evolutionary approach that explicitly relaxes the time-homogeneity assumption by allowing the specification of different infinitesimal substitution rate matrices across different time intervals, called epochs, along the evolutionary history. We focus on an epoch model implementation in a Bayesian inference framework that offers great modeling flexibility in drawing inference about any discrete data type characterized as a continuous-time Markov chain, including phylogeographic traits. To alleviate the computational burden that the additional temporal heterogeneity imposes, we adopt a massively parallel approach that achieves both fine- and coarse-grain parallelization of the computations across branches that accommodate epoch transitions, making extensive use of graphics processing units. Through synthetic examples, we assess model performance in recovering evolutionary parameters from data generated according to different evolutionary scenarios that comprise different numbers of epochs for both nucleotide and codon substitution processes. We illustrate the usefulness of our inference framework in two different applications to empirical data sets: the selection dynamics on within-host HIV populations throughout infection and the seasonality of global influenza circulation. In both cases, our epoch model captures key features of temporal heterogeneity that remained difficult to test using ad hoc procedures.

分子系统发育与系统地理学重建(Molecular phylogenetic and phylogeographic reconstructions)通常默认采用时间齐次替换过程(time-homogeneous substitution processes)假设。受限于计算便利性,该假设牺牲了生物学真实性,且几乎无法用于揭示进化历史中的时间动态变化。为此,我们提出一种进化分析方法,通过允许在进化历史的不同时间区间(下文简称进化区间,epochs)设置不同的无穷小替换速率矩阵(infinitesimal substitution rate matrices),显式放宽了时间齐次性假设。我们聚焦于贝叶斯推断框架(Bayesian inference framework)下的进化区间模型实现,该框架可为任意以连续时间马尔可夫链(continuous-time Markov chain)表征的离散数据类型(包括系统地理学特征)的推断提供极强的建模灵活性。为缓解额外引入的时间异质性带来的计算负担,我们采用大规模并行方法,在适配进化区间转换的分支上实现细粒度与粗粒度并行化计算,并充分利用图形处理器(graphics processing units,简称GPU)加速运算。通过模拟数据集(synthetic examples),我们针对核苷酸与密码子替换过程,在包含不同数量进化区间的多种进化场景下生成数据,以此评估模型恢复进化参数的性能。我们还通过两项实证数据分析案例展示了该推断框架的实用性:一是感染全过程中宿主内HIV种群的选择动态,二是全球流感传播循环的季节特征。在这两项案例中,我们的进化区间模型成功捕捉到了此前难以通过特设方法(ad hoc procedures)检验的时间异质性关键特征。
创建时间:
2014-02-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作