Online phylogenetics using parsimony produces slightly better trees and is dramatically more efficient for large SARS-CoV-2 phylogenies than de novo and maximum-likelihood approaches
收藏DataCite Commons2026-03-16 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.7291/D1038P
下载链接
链接失效反馈官方服务:
资源简介:
Phylogenetics has been foundational to SARS-CoV-2 research and public
health policy, assisting in genomic surveillance, contact tracing, and
assessing emergence and spread of new variants. However, phylogenetic
analyses of SARS-CoV-2 have often relied on tools designed for de novo
phylogenetic inference, in which all data are collected before any
analysis is performed and the phylogeny is inferred once from scratch.
SARS-CoV-2 datasets do not fit this mould. There are currently over 5
million sequenced SARS-CoV-2 genomes in public databases, with tens of
thousands of new genomes added every day. Continuous data collection,
combined with the public health relevance of SARS-CoV-2, invites an
"online" approach to phylogenetics, in which new samples are
added to existing phylogenetic trees every day. The extremely dense
sampling of SARS-CoV-2 genomes also invites a comparison between
Likelihood and Parsimony approaches to phylogenetic inference. Maximum
Likelihood (ML) methods are more accurate when there are multiple changes
at a single site on a single branch, but this accuracy comes at a large
computational cost, and the dense sampling of SARS-CoV-2 genomes means
that these instances will be extremely rare. Therefore, it may be that
approaches based on Maximum Parsimony (MP) are sufficiently accurate for
reconstructing phylogenies of SARS-CoV-2, and their simplicity means that
they can be applied to much larger datasets. Here, we evaluate the
performance of de novo and online phylogenetic approaches, and ML and MP
frameworks, for inferring large and dense SARS-CoV-2 phylogenies. Overall,
we find that online phylogenetics produces similar phylogenetic trees to
de novo analyses for SARS-CoV-2, and that MP optimizations produce more
accurate SARS-CoV-2 phylogenies than do ML optimizations. Since MP is
thousands of times faster than presently available implementations of ML
and online phylogenetics is faster than de novo, we therefore propose
that, in the context of comprehensive genomic epidemiology of SARS-CoV-2,
MP online phylogenetics approaches should be favored.
提供机构:
Dryad
创建时间:
2021-12-31



