Data from: Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography
收藏DataCite Commons2025-04-01 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.7jr85rj
下载链接
链接失效反馈官方服务:
资源简介:
Discrete phylogeography using software such as BEAST considers the
sampling location of each taxon as fixed; often to a single location
without uncertainty. When studying viruses, this implies that there is no
possibility that the location of the infected host for that taxa is
somewhere else. Here, we relaxed this strong assumption and allowed for
analytic integration of uncertainty for discrete virus phylogeography. We
used automatic language processing methods to find and assign uncertainty
to alternative potential locations. We considered two influenza case
studies: H5N1 in Egypt; H1N1 pdm09 in North America. For each, we
implemented scenarios in which 25% of the taxa had different amounts of
sampling uncertainty including 10%, 30%, and 50% uncertainty and varied
how it was distributed for each taxon. This includes scenarios that: (i)
placed a specific amount of uncertainty on one location while uniformly
distributing the remaining amount across all other candidate locations
(correspondingly labeled 10, 30, and 50); (ii) assigned the remaining
uncertainty to just one other location; thus “splitting” the uncertainty
among two locations (i.e. 10/90, 30/70, and 50/50) and; (iii) eliminated
uncertainty via two pre-defined heuristic approaches: assignment to a
centroid location (CNTR) or the largest population in the country (POP).
We compared all scenarios to a reference standard in which all taxa had
known (absolutely certain) locations. From this, we implemented five
random selections of 25% of the taxa and used these for specifying
uncertainty. We performed posterior analyses for each scenario, including:
(a) virus persistence, (b) migration rates, (c) trunk rewards, and (d) the
posterior probability of the root state. The scenarios with sampling
uncertainty were closer to the reference standard than CNTR and POP. For
H5N1, the absolute error of virus persistence had a median range of 0.005
– 0.047 for scenarios with sampling uncertainty – (i) and (ii) above -
versus a range of 0.063 – 0.075 for CNTR and POP. Persistence for the
pdm09 case study followed a similar trend as did our analyses of migration
rates across scenarios (i) and (ii). When considering the posterior
probability of the root state, we found all but one of the H5N1 scenarios
with sampling uncertainty had agreement with the reference standard on the
origin of the outbreak whereas both CNTR and POP disagreed. Our results
suggest that assigning geospatial uncertainty to taxa benefits estimation
of virus phylogeography as compared to ad-hoc heuristics. We also found
that, in general, there was limited difference in results regardless of
how the sampling uncertainty was assigned; uniform distribution or split
between two locations did not greatly impact posterior results. This
framework is available in BEAST v.1.10. In future work, we will explore
viruses beyond influenza. We will also develop a web interface for
researchers to use our language processing methods to find and assign
uncertainty to alternative potential locations for virus phylogeography.
提供机构:
Dryad
创建时间:
2018-12-27



