nRCFV: A sequence, taxon and character state-normalised metric for the pre-reconstruction evaluation of compositional heterogeneity
收藏DataCite Commons2025-05-01 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.wpzgmsbpn
下载链接
链接失效反馈官方服务:
资源简介:
Motivation Compositional heterogeneity – when the proportions of
nucleotides and amino acids are not broadly similar across the dataset –
is a cause of a great number of phylogenetic artefacts. Whilst a variety
of methods can identify it post-hoc, few metrics exist to quantify
compositional heterogeneity prior to the computationally intensive task of
phylogenetic tree reconstruction. Here we assess the efficacy of one such
existing, widely used, metric: Relative Composition Frequency Variability
(RCFV), using both real and simulated data. Results Our results show that
RCFV can be biased by sequence length, the number of taxa, and the number
of possible character states within the dataset. However, we also find
that missing data does not appear to have an appreciable value on RCFV. We
discuss the theory behind this and the consequences of this for the future
of the usage of the RCFV value and propose a new metric, nRCFV, which
accounts for these biases. Alongside this, we present a new software that
easily calculates both RCFV and nRCFV, called nRCFV_Reader. Availability
and Implementation nRCFV has been implemented in RCFV_Reader, available
at: https://github.com/JFFleming/RCFV_Reader. Both our simulation and real
data are available in this dataset.
提供机构:
Dryad
创建时间:
2023-02-02



