The Dayhoff Exchange Score: A new metric to quantify site saturation in amino acid datasets prior to phylogenetic analysis
收藏DataONE2024-08-12 更新2025-04-26 收录
下载链接:
https://search.dataone.org/view/sha256:23c4daf1da9bfdfe2394173416033fd6962e4e833530415e881ae09c5e6d8a08
下载链接
链接失效反馈官方服务:
资源简介:
Site saturation is a persistent problem in phylogenetic analyses, where it can hinder the accuracy of topology reconstruction. It is fundamentally caused by large amounts of independent change along branches, causing the model to be unable to distinguish phylogenetic signal from noise. The Dayhoff Exchange Score (DE-score) is a new metric to assess site saturation within and between amino acid datasets, which provides both a whole dataset overview and taxon-specific values that represent the contribution of a given taxon to the whole dataset saturation. We first assess the efficacy of this score at detecting increased site saturation over 20,000 simulation datasets, compare it to the existing Slope R2 score and then assess its efficacy in the face of the potentially confounding factors of increasing taxon number, number of positions in the alignment, missing data and noise. Finally, we use the DE-Score to re-evaluate a previously published dataset by Kocot et al (2017), to illustrate it..., Establishing the ability of Dayhoff Category Exchange Ratios to detect Site Saturation
Â
To assess the utility of directly measuring the ratio of between-Dayhoff Group and within-Dayhoff Group mutations to assess site saturation within a dataset, we used simulation datasets previously used to assess site saturation by Hernandez & Ryan (2021) (Hernandez and Ryan, 2021). These datasets, originally based on the Chang et al (2015) tree (Chang, et al., 2015), were generated by applying a scaling factor to all branches of the original tree, from 1 to 20. The datasets were generated under two models, the Dayhoff model and the JTT model. Each scaling factor category comprises 1,000 datasets, resulting in a total of 20,000 datasets for each model category: for a final 40,000 datasets.
Â
Assessing the Effects of Missing Data and an increasing Number of Taxa and Positions on the Dayhoff Category Exchange Ratio
Â
To assess the effect of changes in the number of taxa and positions in an alignmen..., , # The Dayhoff Exchange Score: A new metric to quantify site saturation in amino acid datasets prior to phylogenetic analysis.
[https://doi.org/10.5061/dryad.34tmpg4tm](https://doi.org/10.5061/dryad.34tmpg4tm)
## Description of the data and file structure
This supplemental information accompanies the DEScore paper, and is formatted as follows:
Inside the directory DEScoreSuppData you will find 6 folders:
1\. Kocot2017
2\. MissingData_Simulations
3\. NoiseSimulations
4\. Other
5\. PositionAndTaxaIncrease_DEScore
6\. SaturationSims
### 1. Kocot2017
```
This directory contains two further directories, Kocot2017_Slope and DEScore_Reselected. Kocot2017_Slope comprises 6 directories: these are copies of the Best 106 genes through to Best 532 genes as selected by the Slope Measurements in Kocot et al 2017. These directories each contain a philip alignment file, a partition file in txt format, and the output of a RAxML analysis: a most likely tree (RAxML_bipartitions..tre), bootstrap...
创建时间:
2024-08-13



