Measuring phylogenetic information of incomplete sequence data
收藏DataCite Commons2025-04-01 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.zs7h44j9f
下载链接
链接失效反馈官方服务:
资源简介:
Widely used approaches for extracting phylogenetic information from
aligned sets of molecular sequences rely upon probabilistic models of
nucleotide substitution or amino-acid replacement. The phylogenetic
information that can be extracted depends on the number of columns in the
sequence alignment and will be decreased when the alignment contains gaps
due to insertion or deletion events. Motivated by the measurement of
information loss, we suggest assessment of the Effective Sequence Length
(ESL) of an aligned data set. The ESL can differ from the actual number of
columns in a sequence alignment because of the presence of alignment gaps.
Furthermore, the estimation of phylogenetic information is affected by
model misspecification. Inevitably, the actual process of molecular
evolution differs from the probabilistic models employed to describe this
process. This disparity means the amount of phylogenetic information in an
actual sequence alignment will differ from the amount in a simulated data
set, which motivated us to develop a new test for model adequacy. Via
theory and empirical data analysis, we show how to disentangle the effects
of gaps and model misspecification. By comparing the Fisher information of
actual and simulated sequences, we identify which alignment sites and tree
branches are most affected by gaps and model misspecification.
提供机构:
Dryad
创建时间:
2021-09-02



