Online phylogenetics with matOptimize for SARS-CoV-2
收藏DataCite Commons2026-03-16 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.7291/D13Q2J
下载链接
链接失效反馈官方服务:
资源简介:
Phylogenetics has been foundational to SARS-CoV-2 research and public
health policy, assisting in genomic surveillance, contact tracing, and
assessing emergence and spread of new variants. However, phylogenetic
analyses of SARS-CoV-2 have often relied on tools designed for de novo
phylogenetic inference, in which all data are collected before any
analysis is performed and the phylogeny is inferred once from scratch.
SARS-CoV-2 datasets do not fit this mould. There are currently over 14
million sequenced SARS-CoV-2 genomes in online databases, with tens of
thousands of new genomes added every day. Continuous data collection,
combined with the public health relevance of SARS-CoV-2, invites an
"online" approach to phylogenetics, in which new samples are
added to existing phylogenetic trees every day. The extremely dense
sampling of SARS-CoV-2 genomes also invites a comparison between
likelihood and parsimony approaches to phylogenetic inference. Maximum
likelihood (ML) and pseudo-ML methods may be more accurate when there are
multiple changes at a single site on a single branch, but this accuracy
comes at a large computational cost, and the dense sampling of SARS-CoV-2
genomes means that these instances will be extremely rare because each
internal branch is expected to be extremely short. Therefore, it may be
that approaches based on maximum parsimony (MP) are sufficiently accurate
for reconstructing phylogenies of SARS-CoV-2, and their simplicity means
that they can be applied to much larger datasets. Here, we evaluate the
performance of de novo and online phylogenetic approaches, as well as ML,
pseudo-ML, and MP frameworks for inferring large and dense SARS-CoV-2
phylogenies. Overall, we find that online phylogenetics produces similar
phylogenetic trees to de novo analyses for SARS-CoV-2, and that MP
optimization with UShER and matOptimize produces equivalent SARS-CoV-2
phylogenies to some of the most popular ML and pseudo-ML inference tools.
MP optimization with UShER and matOptimize is thousands of times faster
than presently available implementations of ML and online phylogenetics is
faster than de novo inference. Our results therefore suggest that
parsimony-based methods like UShER and matOptimize represent an accurate
and more practical alternative to established maximum likelihood
implementations for large SARS-CoV-2 phylogenies.
提供机构:
Dryad
创建时间:
2022-10-10



