five

Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks

收藏
Figshare2017-05-19 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/Simultaneous_inference_of_phylogenetic_and_transmission_trees_in_infectious_disease_outbreaks/5018969
下载链接
链接失效反馈
官方服务:
资源简介:
Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees.

在传染病暴发期间,对宿主样本中的病原体开展全基因组测序(whole-genome sequencing)已日趋常规。此类数据可提供潜在传播事件的相关信息,用于后续流行病学分析,例如识别传染性与传播的风险因素。然而,传播事件与序列数据之间的关联,却因四类未被充分观测的过程所带来的不确定性而变得模糊:传播过程、病例观测、宿主内病原体动态变化以及突变。若要准确解析传播事件,必须将上述过程悉数纳入考量范围。近年来,相关理论与方法开发已取得诸多进展,但现有应用往往存在两类局限:一是做出简化假设,常会破坏四类过程之间的依存关系;二是仅针对特定数据集量身定制,配套的模型假设与代码均与该数据集相适配。为获取适用范围更广的分析方法,我们开发了一种利用序列数据重构传播树(transmission trees)的全新方法。该方法结合了针对传播、病例观测、宿主内病原体动态变化以及突变的基础模型,前提假设为暴发已结束且所有病例均已被观测到。我们采用结合马尔可夫链蒙特卡洛(Markov Chain Monte Carlo, MCMC)的贝叶斯推断(Bayesian inference),并设计了全新的提议步骤,以高效遍历后验分布,同时一次性考量所有未被观测的过程。这使得我们能够从后验分布中高效采样传播树,并稳健估计共识传播树。我们将所提出的方法实现于一款全新的R包(R package)phybreak中。该方法在针对新生成与已发表的模拟数据的测试中均表现优异。我们将该模型应用于五组针对高密度采样的传染病暴发的数据集,涵盖了多种不同的流行病学场景。仅以采样时间与序列作为输入数据,我们的分析不仅验证了原有研究结果,更实现了优化:更贴合实际的感染时间为推断得到的传播树赋予了更高的置信度。
创建时间:
2017-05-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作