five

An efficient coalescent model for heterochronously sampled molecular data

收藏
DataCite Commons2024-04-21 更新2024-08-19 收录
下载链接:
https://tandf.figshare.com/articles/dataset/An_efficient_coalescent_model_for_heterochronously_sampled_molecular_data/25443744/1
下载链接
链接失效反馈
官方服务:
资源简介:
Molecular sequence variation at a locus informs about the evolutionary history of the sample and past population size dynamics. The Kingman coalescent is used in a generative model of molecular sequence variation to infer evolutionary parameters. However, it is well understood that inference under this model does not scale well with sample size. Here, we build on recent work based on a lower resolution coalescent process, the Tajima coalescent, to model longitudinal samples. While the Kingman coalescent models the ancestry of labeled individuals, we model the ancestry of individuals labeled by their sampling time. We propose a new inference scheme for the reconstruction of effective population size trajectories based on this model and the infinite-sites mutation model. Modeling of longitudinal samples is necessary for applications (<i>e.g.</i>, ancient DNA and RNA from rapidly evolving pathogens like viruses) and statistically desirable (variance reduction and parameter identifiability). We propose an efficient algorithm to calculate the likelihood and employ a Bayesian nonparametric procedure to infer the population size trajectory. We provide a new MCMC sampler to explore the space of heterochronous Tajima’s genealogies and model parameters. We compare our procedure with state-of-the-art methodologies in simulations and an application to ancient bison DNA sequences.

某基因座的分子序列变异,可为样本的演化历史以及过去的种群规模动态提供信息。金曼溯祖模型(Kingman coalescent)常被应用于分子序列变异的生成式模型中,以推断演化参数。然而,学界已普遍认识到,该模型下的推断过程难以随样本量的增加而高效扩展。本研究基于近期以低分辨率溯祖过程——田岛溯祖模型(Tajima coalescent)——为基础的相关工作,针对纵向样本构建演化模型。金曼溯祖模型针对带有标识的个体构建祖先关系,而本研究则以个体的采样时间作为标识,对其祖先关系进行建模。基于该模型与无限位点突变模型(infinite-sites mutation model),我们提出了一种全新的有效种群规模轨迹重建推断框架。纵向样本建模在实际应用中不可或缺(例如,来自病毒等快速演化病原体的古DNA与古RNA),同时在统计学层面也具有优势(可降低方差、提升参数可识别性)。我们设计了一种高效的似然计算算法,并采用贝叶斯非参数推断方法来还原种群规模轨迹。我们开发了一种全新的马尔可夫链蒙特卡洛(Markov Chain Monte Carlo, MCMC)采样器,用于探索异时田岛谱系空间与模型参数空间。我们通过模拟实验,以及对古野牛DNA序列的实际应用分析,将所提方法与当前主流的前沿技术进行了对比。
提供机构:
Taylor & Francis
创建时间:
2024-03-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作