Supplementary data for: DNA sequences are as useful as protein sequences for inferring deep phylogenies

DataONE2023-06-28 更新2025-08-02 收录

下载链接：

https://search.dataone.org/view/sha256:285d983759c39f7413b9c1441b32ae7d9291b81d44cc35581b09c9cc33587430

下载链接

链接失效反馈

官方服务：

资源简介：

Inference of deep phylogenies has almost exclusively used protein rather than DNA sequences, based on the perception that protein sequences are less prone to homoplasy and saturation or to issues of compositional heterogeneity than DNA sequences. Here we analyze a model of codon evolution under an idealized genetic code and demonstrate that those perceptions may be misconceptions. We conduct a simulation study to assess the utility of protein versus DNA sequences for inferring deep phylogenies, with protein-coding data generated under models of heterogeneous substitution processes across sites in the sequence and among lineages on the tree, and then analyzed using nucleotide, amino acid, and codon models. Analysis of DNA sequences under nucleotide-substitution models (possibly with the third codon positions excluded) recovered the correct tree at least as often as analysis of the corresponding protein sequences under modern amino acid models. We also applied the different data-analysis ..., The repository includes scripts, control files and empirical data used in the \"Kapli P., Kotari I., Telford M., Goldman N., Yang Z. DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies.\" manuscript., Supplementary data for the manuscript:Kapli P., Kotari I., Telford M., Goldman N., Yang Z. DNA Sequences Are as Useful as Protein Sequences for Inferring Deep Phylogenies. Brief explanations of the files: The script \"convert.py\" takes as input a codon alignment and outputs the equivalent amino acid and the DNA alignment of the 1st and 2nd codon positions. The \"HOMO-control.txt\" is the control file for simulating sequences under the homogeneous model with indelible (http://abacus.gene.ucl.ac.uk/software/indelible/). All guide trees and model parameters (M0 and M3) are provided in the file. The \"SH1-control.txt\" is the control file for simulating sequences under the site-heterogeneous (SH1) model. SH1 assumes site-heterogeneous codon frequencies generated from observed frequencies in coding genes from mammal species. All guide trees and model parameters (M0 and M3) are provided in the file. The two Python scripts: \"generate_control_M0_SH2.py\" and \"generate_control_M3_SH2.py\" create con...

深层系统发育的推断几乎完全依赖蛋白质序列而非DNA序列，这一做法基于如下认知：与DNA序列相比，蛋白质序列更不易出现同塑性、饱和现象或组成异质性问题。本文分析了理想遗传密码下的密码子演化模型，并证明上述认知可能是一种误解。我们通过模拟研究评估了蛋白质序列与DNA序列在深层系统发育推断中的效用：首先在序列位点间及树中谱系间的异质性替换过程模型下生成蛋白质编码数据，随后利用核苷酸、氨基酸及密码子模型对其进行分析。在核苷酸替换模型（可能排除第三密码子位点）下分析DNA序列，其正确树的恢复率至少与现代氨基酸模型下对应蛋白质序列的分析结果相当。我们还应用了不同的数据分析... 本仓库包含论文《Kapli P., Kotari I., Telford M., Goldman N., Yang Z. DNA序列与蛋白质序列在深层系统发育推断中的效用相当》所用的脚本、控制文件及实证数据。论文补充数据：Kapli P., Kotari I., Telford M., Goldman N., Yang Z. DNA序列与蛋白质序列在深层系统发育推断中的效用相当。文件简要说明：脚本"convert.py"以密码子比对结果为输入，输出对应的氨基酸比对结果及第一、第二密码子位点的DNA比对结果。 "HOMO-control.txt"是使用indelible软件（http://abacus.gene.ucl.ac.uk/software/indelible/）在均质模型下模拟序列的控制文件。文件中包含所有引导树及模型参数（M0和M3）。 "SH1-control.txt"是位点异质性（SH1）模型下模拟序列的控制文件。SH1模型假设密码子频率具有位点异质性，该频率源自哺乳动物物种编码基因中的观测频率。文件中包含所有引导树及模型参数（M0和M3）。两个Python脚本："generate_control_M0_SH2.py"和"generate_control_M3_SH2.py"用于创建con...

创建时间：

2025-07-20