Data from: Mitochondrial phylogenomics of early land plants: mitigating the effects of saturation, compositional heterogeneity, and codon-usage bias
收藏DataCite Commons2025-04-01 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.7b470
下载链接
链接失效反馈官方服务:
资源简介:
Phylogenetic analyses using concatenation of genomic-scale data have been
seen as the panacea to resolving the incongruences among inferences from
few or single genes. However, phylogenomics may also suffer from
systematic errors, due to the, perhaps cumulative, effects of saturation,
among-taxa compositional (GC content) heterogeneity, or codon-usage bias
plaguing the individual nucleotide loci that are concatenated. Here we
provide an example of how these factors affect the inferences of the
phylogeny of early land plants based on mitochondrial genomic data.
Mitochondrial sequences evolve slowly in plants and hence are thought to
be suitable for resolving deep relationships. We newly assembled
mitochondrial genomes from 20 bryophytes, complemented these with 40 other
streptophytes (land plants plus algal outgroups), compiling a data matrix
of 60 taxa and 41 mitochondrial genes. Homogeneous analyses of the
concatenated nucleotide data resolve mosses as sister-group to the
remaining land plants. However, the corresponding translated amino acid
data support the liverwort lineage in this position. Both results receive
weak to moderate support in maximum likelihood analyses, but strong
support in Bayesian inferences. Tests of alternative hypotheses using
either nucleotide or amino-acid data provide implicit support for the
respective optimal topologies. By analyzing the nucleotide data, we found
that the 3rd codon positions are more saturated than the 1st and 2nd codon
positions, and excluding these from the analyses leads to a topology
congruent with that obtained using amino-acid data. Further, we determined
that land plant lineages differ in their nucleotide composition, and in
their usage of synonymous codon variants. Composition heterogeneous
Bayesian analyses employing a non-stationary model that accounts for
variation in among-lineage composition, and inferences from degenerated
nucleotide data that avoids the effects of synonymous mutations that
underlie codon-usage bias, again recovered liverworts being sister to the
remaining land plants. These analyses indicate that the discrepancy
between the nucleotide-based and the amino acid-based trees is caused by
the lineage specific, parallel compositional bias, or synonymous mutations
driving codon-usage bias, as well as saturation in the 3rd codon
positions. While genomic data may generate highly supported phylogenetic
trees, these inferences may be artifacts. We suggest that phylogenomic
analyses should assess the possible impact of potential biases through
comparisons of protein coding gene data and their amino-acids
translations, by analyzing data modeling compositional bias, and by
excluding nucleotide noisy signals due to saturation or codon-usage bias.
We caution against relying on any one presentation of the data (nucleotide
or amino acid) or any one type of analysis even when analyzing large-scale
data sets, no matter how well-supported, without fully exploring the
effects of substitution models.
提供机构:
Dryad
创建时间:
2014-07-29



