The Dayhoff Exchange Score: A new metric to quantify site saturation in amino acid datasets prior to phylogenetic analysis
收藏DataONE2025-12-18 更新2025-12-20 收录
下载链接:
https://search.dataone.org/view/sha256:92010014469b82acc50db8505f083f83ff466ae3851814c7e1a7517e05586c1d
下载链接
链接失效反馈官方服务:
资源简介:
Entropic site saturation is a persistent problem in phylogenetic analyses, where it can hinder the accuracy of topology reconstruction. It is fundamentally caused by large amounts of independent change along branches, causing the model to be unable to distinguish phylogenetic signal from noise. The Dayhoff Exchange Score (DE-score) is a new metric to assess this form of site saturation within and between amino acid datasets, which provides both a whole dataset overview and taxon-specific values that represent the contribution of a given taxon to the whole dataset entropic site saturation. We first assess the efficacy of this score at detecting increased entropic site saturation over 20,000 simulation datasets, compare it to the existing Slope R2 score, and then assess its efficacy in the face of the potentially confounding factors of increasing taxon number, number of positions in the alignment, missing data, and noise. Finally, we use the DE-Score to re-evaluate several previously publ..., The methods and their implications are explored in greater detail in the pdf file that can be found inside folder 4_Other.
1_Kocot2017: Applying the DE-Score to real data: Reselection Datasets
The work of Kocot et al (2017) (Kocot, K.M., Struck, T.H., et al. 2017) was chosen for more thorough examination. This paper was chosen for reanalysis as the Slope score (Nosenko, T., Schreiber, F., et al. 2013) was previously used to assess site saturation within genes in this dataset, which allowed us to assess the coherence between the two metrics and the effect of their differences.
For this reselection study, a phylogenetic tree was recovered using IQTree-mpi (v1.6.12) (Nguyen, L.-T., Schmidt, H.A., et al. 2015) and the LG+F model, as in the original analysis, but instead using the sextile of least saturated genes, as selected by the DE-Score, and then a c..., , # **README**
This supplemental information accompanies the DEScore paper, and is formatted as follows:
Inside the directory DEScoreSuppData you will find 6 folders:
1_Kocot2017.zip
2_MissingData_Simulations.zip
3_NoiseSimulations.zip
4_Other.zip
5_PositionAndTaxaIncrease_DEScore.zip
6_SaturationSims.zip
7_RealDataStudies.zip
8_ComparisonWithSatuRation.zip
## **1. Kocot2017**
```
This directory contains three further directories, Kocot2017_Slope, Best532_LB_DE_nRCFV and DEScore_Reselected.
Kocot2017_Slope comprises 6 directories: these are copies of the Best 106 genes through to Best 532 genes as selected by the Slope Measurements in Kocot et al 2017. These directories each contain a phylip alignment file, a partition file in txt format, and the output of a RAxML analysis: a most likely tree (RAxML_bipartitions.<Slope Sextile>.tre), bootstrap support files, and a bipartition file.
Best532_LB_DE_nRCFV contains the results of the Kocot et al 2017 reselection based on the gene..., , **Changes after Aug 12, 2024:**
This is the third upload of this supplemental dataset, and significant changes have occurred between this version and the original upload.
v3.0 (DEC 17th 2025)
4_Other - Both scripts, Noisemaker.pl and DEScoreCalculator.pl, have been significantly revised. Noisemaker has changed from a Ruby script to a Perl script, and the DEScoreCalculator has moved to Version 1.5, which includes a new, faster logic for calculating the DE-Score, a removal of Perl module dependencies, and a POD.
7_Real Data Studies - The multi-gene datasets have expanded from 5 to 21.
v2.0 (AUG 4th 2025)
Most notably, this version divides the supplemental data into individual zip files, allowing users to more easily access the data they are interested in. This comes with a new data structure that groups the data into 8 categories: 1_Kocot2017.zip, 2_MissingData_Simulations.zip, 3_NoiseSimulations.zip, 4_Other.zip, 5_PositionAndTaxaIncrease_DEScore.zip, 6_SaturationSims.zip, 7_...
创建时间:
2025-12-18



