Ambiguity coding allows accurate inference of evolutionary parameters from alignments in an aggregated state-space
收藏DataCite Commons2026-03-11 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.tx95x69sm
下载链接
链接失效反馈官方服务:
资源简介:
How can we best learn the history of a protein's evolution? Ideally,
a model of sequence evolution should capture both the process that
generates genetic variation and the functional constraints determining
which changes are fixed. However, in practical terms the most suitable
approach may simply be the one that combines the convenience of easily
available input data with the ability to return useful parameter
estimates. For example, we might be interested in a measure of the
strength of selection (typically obtained using a codon model) or an
ancestral structure (obtained using structural modelling based on inferred
amino acid sequence and side chain configuration). But what if data in the
relevant state-space are not readily available? We show that it is
possible to obtain accurate estimates of the outputs of interest using an
established method for handling missing data. Encoding observed characters
in an alignment as ambiguous representations of characters in a larger
state-space allows the application of models with the desired features to
data that lack the resolution that is normally required. This strategy is
viable because the evolutionary path taken through the observed space
contains information about states that were likely visited in the
"unseen" state-space. To illustrate this, we consider two
examples with amino acid sequences as input. We show that ω, a parameter
describing the relative strength of selection on non-synonymous and
synonymous changes, can be estimated in an unbiased manner using an
adapted version of a standard 61-state codon model. Using simulated and
empirical data, we find that ancestral amino acid side chain configuration
can be inferred by applying a 55-state empirical model to 20-state amino
acid data. Where feasible, combining inputs from both ambiguity-coded and
fully resolved data improves accuracy. Adding structural information to as
few as 12.5% of the sequences in an amino acid alignment results in
remarkable ancestral reconstruction performance compared to a benchmark
that considers the full rotamer state information. These examples show
that our methods permit the recovery of evolutionary information from
sequences where it has previously been inaccessible.
提供机构:
Dryad
创建时间:
2020-04-08



