Supplementary material for: Phylogenetic analysis of allotetraploid species using polarized genomic sequences
收藏DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.dfn2z353j
下载链接
链接失效反馈官方服务:
资源简介:
Phylogenetic analysis of polyploid hybrid species has long posed a
formidable challenge as it requires the ability to distinguish between
alleles of different ancestral origins in order to disentangle their
individual evolutionary history. This problem has been previously
addressed by conceiving phylogenies as reticulate networks, using a
two-step phasing strategy that first identifies and segregates
homoeologous loci and then, during a second phasing step, assigns each
gene copy to one of the subgenomes of an allopolyploid species. Here, we
propose an alternative approach, one that preserves the core idea behind
phasing – to produce separate nucleotide sequences that capture the
reticulate evolutionary history of a polyploid – while vastly simplifying
its implementation by reducing a complex multi-stage procedure to a single
phasing step. While most current methods used for phylogenetic
reconstruction of polyploid species require sequencing reads to be
pre-phased using experimental or computational methods – usually an
expensive, complex, and/or time-consuming endeavor – phasing executed
using our algorithm is performed directly on the multiple-sequence
alignment (MSA), a key change that allows for the simultaneous segregation
and sorting of gene copies. We introduce the concept of genomic
polarization which, when applied to an allopolyploid species, produces
nucleotide sequences that capture the fraction of a polyploid genome that
deviates from that of a reference sequence, usually one of the other
species present in the MSA. We show that if the reference sequence is one
of the parental species, the polarized polyploid sequence has a close
resemblance (high pairwise sequence identity) to the second parental
species. This knowledge is harnessed to build a new heuristic algorithm
where, by replacing the allopolyploid genomic sequence in the MSA by its
polarized version, it is possible to identify the phylogenetic position of
the polyploid's ancestral parents in an iterative process. The
proposed methodology can be used with long-read as well as short-read
high-throughput sequencing (HTS) data, and requires only one
representative individual for each species to be included in the
phylogenetic analysis. In its current form, it can be used in the analysis
of phylogenies containing tetraploid and diploid species. We test the
newly developed method extensively using simulated data in order to
evaluate its accuracy. We show empirically that the use of polarized
genomic sequences allows for the correct identification of both parental
species of an allotetraploid with up to 97% certainty in phylogenies with
moderate levels of incomplete lineage sorting (ILS), and 87% in
phylogenies containing high levels of ILS. We then apply the polarization
protocol to reconstruct the reticulate histories of Arabidopsis kamchatica
and A. suecica, two allopolyploids whose ancestry has been well
documented.
提供机构:
Dryad
创建时间:
2022-12-02



