Data from: Fast dating using least-squares criteria and algorithms
收藏DataCite Commons2025-04-01 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.968t3
下载链接
链接失效反馈官方服务:
资源简介:
Phylogenies provide a useful way to understand the evolutionary history of
genetic samples, and data sets with more than a thousand taxa are becoming
increasingly common, notably with viruses (e.g., human immunodeficiency
virus (HIV)). Dating ancestral events is one of the first, essential goals
with such data. However, current sophisticated probabilistic approaches
struggle to handle data sets of this size. Here, we present very fast
dating algorithms, based on a Gaussian model closely related to the
Langley–Fitch molecular-clock model. We show that this model is robust to
uncorrelated violations of the molecular clock. Our algorithms apply to
serial data, where the tips of the tree have been sampled through times.
They estimate the substitution rate and the dates of all ancestral nodes.
When the input tree is unrooted, they can provide an estimate for the root
position, thus representing a new, practical alternative to the standard
rooting methods (e.g., midpoint). Our algorithms exploit the tree
(recursive) structure of the problem at hand, and the close relationships
between least-squares and linear algebra. We distinguish between an
unconstrained setting and the case where the temporal precedence
constraint (i.e., an ancestral node must be older that its daughter nodes)
is accounted for. With rooted trees, the former is solved using linear
algebra in linear computing time (i.e., proportional to the number of
taxa), while the resolution of the latter, constrained setting, is based
on an active-set method that runs in nearly linear time. With unrooted
trees the computing time becomes (nearly) quadratic (i.e., proportional to
the square of the number of taxa). In all cases, very large input trees
(>10,000 taxa) can easily be processed and transformed into
time-scaled trees. We compare these algorithms to standard methods
(root-to-tip, r8s version of Langley–Fitch method, and BEAST). Using
simulated data, we show that their estimation accuracy is similar to that
of the most sophisticated methods, while their computing time is much
faster. We apply these algorithms on a large data set comprising 1194
strains of Influenza virus from the pdm09 H1N1 Human pandemic. Again the
results show that these algorithms provide a very fast alternative with
results similar to those of other computer programs. These algorithms are
implemented in the LSD software (least-squares dating), which can be
downloaded from http://www.atgc-montpellier.fr/LSD/, along with all our
data sets and detailed results. An Online Appendix, providing additional
algorithm descriptions, tables, and figures can be found in the
Supplementary Material available on Dryad at
http://dx.doi.org/10.5061/dryad.968t3.
提供机构:
Dryad
创建时间:
2015-09-25



