five

The Dayhoff Exchange Score: A new metric to quantify site saturation in amino acid datasets prior to phylogenetic analysis

收藏
DataONE2025-12-22 更新2025-12-27 收录
下载链接:
https://search.dataone.org/view/sha256:4873b69d2a878666ffa2a88bd898d677d10ec087c3eb5f956039a9a2a4e7463e
下载链接
链接失效反馈
官方服务:
资源简介:
Entropic site saturation is a persistent problem in phylogenetic analyses, where it can hinder the accuracy of topology reconstruction. It is fundamentally caused by large amounts of independent change along branches, causing the model to be unable to distinguish phylogenetic signal from noise. The Dayhoff Exchange Score (DE-score) is a new metric to assess this form of site saturation within and between amino acid datasets, which provides both a whole dataset overview and taxon-specific values that represent the contribution of a given taxon to the whole dataset entropic site saturation. We first assess the efficacy of this score at detecting increased entropic site saturation over 20,000 simulation datasets, compare it to the existing Slope R2 score, and then assess its efficacy in the face of the potentially confounding factors of increasing taxon number, number of positions in the alignment, missing data, and noise. Finally, we use the DE-Score to re-evaluate several previously publ..., The methods and their implications are explored in greater detail in the pdf file that can be found inside folder 4_Other. 1_Kocot2017: Applying the DE-Score to real data: Reselection Datasets &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; The work of Kocot et al (2017) (Kocot, K.M., Struck, T.H., et al. 2017) was chosen for more thorough examination. This paper was chosen for reanalysis as the Slope score (Nosenko, T., Schreiber, F., et al. 2013)&nbsp;was previously used to assess site saturation within genes in this dataset, which allowed us to assess the coherence between the two metrics and the effect of their differences. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; For this reselection study, a phylogenetic tree was recovered using IQTree-mpi (v1.6.12) (Nguyen, L.-T., Schmidt, H.A., et al. 2015) and the LG+F model, as in the original analysis, but instead using the sextile of least saturated genes, as selected by the DE-Score, and then a c..., , # **README** This supplemental information accompanies the DEScore paper, and is formatted as follows: Inside the directory DEScoreSuppData you will find 6 folders: 1_Kocot2017.zip 2_MissingData_Simulations.zip 3_NoiseSimulations.zip 4_Other.zip 5_PositionAndTaxaIncrease_DEScore.zip 6_SaturationSims.zip 7_RealDataStudies.zip 8_ComparisonWithSatuRation.zip ## **1. Kocot2017** ``` This directory contains three further directories, Kocot2017_Slope, Best532_LB_DE_nRCFV and DEScore_Reselected. Kocot2017_Slope comprises 6 directories: these are copies of the Best 106 genes through to Best 532 genes as selected by the Slope Measurements in Kocot et al 2017. These directories each contain a phylip alignment file, a partition file in txt format, and the output of a RAxML analysis: a most likely tree (RAxML_bipartitions.<Slope Sextile>.tre), bootstrap support files, and a bipartition file. Best532_LB_DE_nRCFV contains the results of the Kocot et al 2017 reselection based on the gene..., , **Changes after Dec 17, 2025:**&nbsp; v4.0 (Dec 22nd 2025) 4_Other - A reupload of this section to match the most recent version on the GitHub which improves the user experience through improved commenting, and changes the logical flow of the Noisemaker.pl script for accessibility. **Changes after Aug 12, 2024:**&nbsp; This is the third upload of this supplemental dataset, and significant changes have occurred between this version and the original upload. v3.0 (DEC 17th 2025) 4_Other - Both scripts, Noisemaker.pl and DEScoreCalculator.pl, have been significantly revised. Noisemaker has changed from a Ruby script to a Perl script, and the DEScoreCalculator has moved to Version 1.5, which includes a new, faster logic for calculating the DE-Score, a removal of Perl module dependencies, and a POD. 7_Real Data Studies - The multi-gene datasets have expanded from 5 to 21. v2.0 (AUG 4th 2025) Most notably, this version divides the supplemental data into individual zip files, allowing ...
创建时间:
2025-12-23
二维码
社区交流群
二维码
科研交流群
商业服务