Data from: Serine codon-usage bias in deep phylogenomics: pancrustacean relationships as a case study

DataONE2012-10-12 更新2024-06-27 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

Phylogenomic analyses of ancient relationships are usually performed using amino acid data, but it is unclear whether amino acids or nucleotides should be preferred. With the 2-fold aim of addressing this problem and clarifying pancrustacean relationships, we explored the signals in the 62 protein-coding genes carefully assembled by Regier et al. in 2010. With reference to the pancrustaceans, this data set infers a highly supported nucleotide tree that is substantially different to the corresponding, but poorly supported, amino acid one. We show that the discrepancy between the nucleotide-based and the amino acids-based trees is caused by substitutions within synonymous codon families (especially those of serine—TCN and AGY). We show that different arthropod lineages are differentially biased in their usage of serine, arginine, and leucine synonymous codons, and that the serine bias is correlated with the topology derived from the nucleotides, but not the amino acids. We suggest that a parallel, partially compositionally driven, synonymous codon-usage bias affects the nucleotide topology. As substitutions between serine codon families can proceed through threonine or cysteine intermediates, amino acid data sets might also be affected by the serine codon-usage bias. We suggest that a Dayhoff recoding strategy would partially ameliorate the effects of such bias. Although amino acids provide an alternative hypothesis of pancrustacean relationships, neither the nucleotides nor the amino acids version of this data set seems to bring enough genuine phylogenetic information to robustly resolve the relationships within group, which should still be considered unresolved.

针对演化深层关系的系统发育基因组学分析通常以氨基酸数据为基础，但目前仍未明确氨基酸与核苷酸数据何者更为适用。本研究旨在解决这一问题，并阐明泛甲壳动物（Pancrustacea）的系统发育关系，我们针对Regier等人2010年精心组装的62个蛋白质编码基因数据集，探究了其中蕴含的演化信号。针对泛甲壳动物类群，本数据集通过核苷酸数据构建出高支持率的系统发育树，该树与与之对应的氨基酸系统发育树存在显著差异，但后者支持率却较低。研究表明，核苷酸与氨基酸系统发育树之间的差异，源于同义密码子家族（尤其是丝氨酸（Serine）的TCN和AGY密码子家族）内的碱基替换事件。我们发现，不同节肢动物支系在丝氨酸、精氨酸（Arginine）和亮氨酸（Leucine）的同义密码子使用上存在差异化偏倚，且丝氨酸密码子使用偏倚与核苷酸数据推导的系统发育拓扑结构显著相关，但与氨基酸数据推导的拓扑结构并无关联。本研究推测，一种平行的、部分由碱基组成特征驱动的同义密码子使用偏倚，影响了核苷酸系统发育树的拓扑结构。由于丝氨酸密码子家族间的碱基替换可通过苏氨酸（Threonine）或半胱氨酸（Cysteine）作为中间产物，氨基酸数据集也可能受到丝氨酸密码子使用偏倚的影响。我们建议采用戴霍夫重编码（Dayhoff recoding）策略，可部分缓解此类偏倚带来的干扰。尽管氨基酸数据为泛甲壳动物的系统发育关系提供了另一套假说，但本数据集的核苷酸与氨基酸版本均未携带足够可靠的系统发育信息，无法稳健解析该类群内部的演化关系，该类群的系统发育关系仍应被视为未解决的问题。

创建时间：

2012-10-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集