Benefits and limits of phasing alleles for network inference of allopolyploid complexes
收藏DataCite Commons2026-03-04 更新2025-04-09 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.5qfttdz53
下载链接
链接失效反馈官方服务:
资源简介:
Accurately reconstructing the reticulate histories of polyploids remains a
central challenge for understanding plant evolution. Although phylogenetic
networks can provide insights into relationships among polyploid lineages,
inferring networks may be hindered by the complexities of homology
determination in polyploid taxa. We use simulations to show that phasing
alleles from allopolyploid individuals can improve phylogenetic network
inference under the multispecies coalescent by obtaining the true network
with fewer loci compared to haplotype consensus sequences or sequences
with heterozygous bases represented as ambiguity codes. Phased allelic
data can also improve divergence time estimates for networks, which is
helpful for evaluating allopolyploid speciation hypotheses and proposing
mechanisms of speciation. To achieve these outcomes in empirical data, we
present a novel pipeline that leverages a recently developed phasing
algorithm to reliably phase alleles from polyploids. This pipeline is
especially appropriate for target enrichment data, where depth of coverage
is typically high enough to phase entire loci. We provide an empirical
example in the North American Dryopteris fern complex that demonstrates
insights from phased data as well as the challenges of network inference.
We establish that our pipeline (PATÉ: Phased Alleles from Target
Enrichment data) is capable of recovering a high proportion of phased loci
from both diploids and polyploids. These data may improve network
estimates compared to using haplotype consensus assemblies by accurately
inferring the direction of gene flow, but statistical non-identifiability
of phylogenetic networks poses a barrier to inferring the evolutionary
history of reticulate complexes.
准确重建多倍体的网状进化历史(reticulate histories)仍是理解植物进化的核心挑战。尽管系统发育网络(phylogenetic networks)可揭示多倍体支系间的关系,但多倍体类群中同源性判定的复杂性可能阻碍网络推断。我们通过模拟表明,与单倍型共识序列或杂合碱基以模糊编码表示的序列相比,对异源多倍体个体的等位基因进行定相(phasing)可在多物种溯祖模型(multispecies coalescent)下,用更少的基因座获得真实网络,从而改进系统发育网络推断。定相后的等位基因数据还可改进网络的分化时间估计,这有助于评估异源多倍体物种形成假说并提出物种形成机制。为在实证数据中实现这些结果,我们提出了一个新的流程(pipeline),该流程利用最新开发的定相算法可靠地对多倍体的等位基因进行定相。该流程特别适用于目标富集数据(target enrichment data),这类数据的覆盖深度通常足以对整个基因座进行定相。我们以北美鳞毛蕨(Dryopteris)复合体为例,展示了定相数据带来的洞见以及网络推断的挑战。我们证实,我们的流程(PATÉ:Phased Alleles from Target Enrichment data)能够从二倍体和多倍体中恢复高比例的定相基因座。与使用单倍型共识组装相比,这些数据可通过准确推断基因流方向改进网络估计,但系统发育网络的统计不可识别性对推断网状复合体的进化历史构成了障碍。
提供机构:
Dryad
创建时间:
2024-05-08



