Performance of akaike information criterion and bayesian information criterion in selecting partition models and mixture models
收藏DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.1jwstqjwj
下载链接
链接失效反馈官方服务:
资源简介:
In molecular phylogenetics, partition models and mixture models provide
different approaches to accommodating heterogeneity in genomic sequencing
data. Both types of models generally give a superior fit to data than
models that assume the process of sequence evolution is homogeneous across
sites and lineages. The Akaike Information Criterion (AIC), an estimator
of Kullback-Leibler divergence, and the Bayesian Information Criterion
(BIC) are popular tools to select models in phylogenetics. Recent work
suggests AIC should not be used for comparing mixture and partition
models. In this work, we clarify that this difficulty is not fully
explained by AIC misestimating the Kullback-Leibler divergence. We also
investigate the performance of the AIC and BIC by comparing amongst
mixture models and amongst partition models. We find that under
non-standard conditions (i.e. when some edges have a small expected number
of changes), AIC underestimates the expected Kullback-Leibler divergence.
Under such conditions, AIC preferred the complex mixture models and BIC
preferred the simpler mixture models. The mixture models selected by AIC
had a better performance in estimating the edge length, while the simpler
models selected by BIC performed better in estimating the base frequencies
and substitution rate parameters. In contrast, AIC and BIC both prefer
simpler partition models over more complex partition models under
non-standard conditions, despite the fact that the more complex partition
model was the generating model. We also investigated how
mispartitioning (i.e. grouping sites that have not evolved under the same
process) affects both the performance of partition models compared to
mixture models and the model selection process. We found that as the level
of mispartitioning increases, the bias of AIC in estimating the expected
Kullback-Leibler divergence remains the same, and the branch lengths and
evolutionary parameters estimated by partition models become less
accurate. We recommend that researchers be cautious when using
AIC and BIC to select among partition and mixture models; other
alternatives, such as cross-validation and bootstrapping should be
explored, but may suffer similar limitations.
提供机构:
Dryad
创建时间:
2022-06-07



