five

Supplementary material for: Phylogenetic analysis of allotetraploid species using polarized genomic sequences

收藏
DataCite Commons2025-06-01 更新2025-06-15 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.dfn2z353j
下载链接
链接失效反馈
官方服务:
资源简介:
Phylogenetic analysis of polyploid hybrid species has long posed a formidable challenge as it requires the ability to distinguish between alleles of different ancestral origins in order to disentangle their individual evolutionary history. This problem has been previously addressed by conceiving phylogenies as reticulate networks, using a two-step phasing strategy that first identifies and segregates homoeologous loci and then, during a second phasing step, assigns each gene copy to one of the subgenomes of an allopolyploid species. Here, we propose an alternative approach, one that preserves the core idea behind phasing – to produce separate nucleotide sequences that capture the reticulate evolutionary history of a polyploid – while vastly simplifying its implementation by reducing a complex multi-stage procedure to a single phasing step. While most current methods used for phylogenetic reconstruction of polyploid species require sequencing reads to be pre-phased using experimental or computational methods – usually an expensive, complex, and/or time-consuming endeavor – phasing executed using our algorithm is performed directly on the multiple-sequence alignment (MSA), a key change that allows for the simultaneous segregation and sorting of gene copies. We introduce the concept of genomic polarization which, when applied to an allopolyploid species, produces nucleotide sequences that capture the fraction of a polyploid genome that deviates from that of a reference sequence, usually one of the other species present in the MSA. We show that if the reference sequence is one of the parental species, the polarized polyploid sequence has a close resemblance (high pairwise sequence identity) to the second parental species. This knowledge is harnessed to build a new heuristic algorithm where, by replacing the allopolyploid genomic sequence in the MSA by its polarized version, it is possible to identify the phylogenetic position of the polyploid's ancestral parents in an iterative process. The proposed methodology can be used with long-read as well as short-read high-throughput sequencing (HTS) data, and requires only one representative individual for each species to be included in the phylogenetic analysis. In its current form, it can be used in the analysis of phylogenies containing tetraploid and diploid species. We test the newly developed method extensively using simulated data in order to evaluate its accuracy. We show empirically that the use of polarized genomic sequences allows for the correct identification of both parental species of an allotetraploid with up to 97% certainty in phylogenies with moderate levels of incomplete lineage sorting (ILS), and 87% in phylogenies containing high levels of ILS. We then apply the polarization protocol to reconstruct the reticulate histories of Arabidopsis kamchatica and A. suecica, two allopolyploids whose ancestry has been well documented.
提供机构:
Dryad
创建时间:
2022-12-02
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作