Data from: Serine codon-usage bias in deep phylogenomics: pancrustacean relationships as a case study

DataONE2012-10-12 更新2024-06-27 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

Phylogenomic analyses of ancient relationships are usually performed using amino acid data, but it is unclear whether amino acids or nucleotides should be preferred. With the 2-fold aim of addressing this problem and clarifying pancrustacean relationships, we explored the signals in the 62 protein-coding genes carefully assembled by Regier et al. in 2010. With reference to the pancrustaceans, this data set infers a highly supported nucleotide tree that is substantially different to the corresponding, but poorly supported, amino acid one. We show that the discrepancy between the nucleotide-based and the amino acids-based trees is caused by substitutions within synonymous codon families (especially those of serine—TCN and AGY). We show that different arthropod lineages are differentially biased in their usage of serine, arginine, and leucine synonymous codons, and that the serine bias is correlated with the topology derived from the nucleotides, but not the amino acids. We suggest that a parallel, partially compositionally driven, synonymous codon-usage bias affects the nucleotide topology. As substitutions between serine codon families can proceed through threonine or cysteine intermediates, amino acid data sets might also be affected by the serine codon-usage bias. We suggest that a Dayhoff recoding strategy would partially ameliorate the effects of such bias. Although amino acids provide an alternative hypothesis of pancrustacean relationships, neither the nucleotides nor the amino acids version of this data set seems to bring enough genuine phylogenetic information to robustly resolve the relationships within group, which should still be considered unresolved.

针对深层演化关系的系统发育基因组学分析通常采用氨基酸数据开展，但目前仍不明确氨基酸数据与核苷酸数据何者更为适宜。本研究旨在解决这一问题，并厘清泛甲壳动物（Pancrustacea）的演化关系，我们对Regier等人2010年精心组装的62个蛋白编码基因数据集展开了信号分析。针对泛甲壳动物类群，该数据集构建得到支持率极高的核苷酸系统发育树，其拓扑结构与对应但支持率较低的氨基酸系统发育树存在显著差异。我们发现，核苷酸与氨基酸系统发育树之间的拓扑冲突源于同义密码子家族（尤其是丝氨酸对应的TCN与AGY密码子家族）内的碱基替换。研究表明，不同节肢动物支系在丝氨酸、精氨酸与亮氨酸的同义密码子使用上存在偏倚差异，且丝氨酸密码子使用偏倚与核苷酸数据推导的拓扑结构显著相关，但与氨基酸数据推导的拓扑结构无关。我们推测，一种由碱基组成偏倚部分驱动的同义密码子使用偏倚，对核苷酸拓扑结构产生了影响。由于丝氨酸密码子家族间的碱基替换可通过苏氨酸或半胱氨酸作为中间过渡，氨基酸数据集同样可能受到丝氨酸密码子使用偏倚的影响。我们建议采用Dayhoff重编码策略以部分缓解此类偏倚带来的干扰。尽管氨基酸数据为泛甲壳动物的演化关系提供了另一套假说，但本数据集的核苷酸与氨基酸版本均未提供足够的真实系统发育信息，无法稳健解析该类群内部的演化关系，该类群的内部关系仍应视为尚未解决的科学问题。

创建时间：

2012-10-12

5,000+

优质数据集

54 个

任务类型

进入经典数据集