five

Unraveling the web of life: Incomplete lineage sorting and hybridization as primary mechanisms over polyploidization in the evolutionary dynamics of pear species

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.3ffbg79r8
下载链接
链接失效反馈
官方服务:
资源简介:
The traditional Tree of Life (ToL) model is increasingly challenged by the Web of Life (WoL) paradigm, which offers a more accurate depiction of organismal phylogeny, particularly in light of the incongruences often observed between gene and species trees. However, the absence of a standardized method for resolving evolutionary mechanisms—such as Incomplete Lineage Sorting (ILS), hybridization, introgression, polyploidization, and whole-genome duplication—remains a significant obstacle in defining the WoL. Characterized by extensive hybridization events, the pear genus Pyrus provides an ideal model for exploring these complexities. In this study, we present a Step-by-Step Exclusion approach for investigating the evolutionary pathways of Pyrus, and our results demonstrate that: 1) ILS, rather than polyploidization, plays a dominant role in the origination of Pyrus; 2) the two subgenera of Pyrus followed independent evolutionary paths, influenced by geographical barriers formed through the uplift of the Tibetan Plateau and increased aridity in Central Asia; 3) both ILS and hybridization have driven the diversification of subg. Pashia, while hybridization alone has shaped the reticulate evolution of subg. Pyrus; 4) the establishment of the Silk Road during the Han Dynasty facilitated genetic exchange between subg. Pyrus and subg. Pashia. The SSE approach offers a versatile framework for studying the evolutionary mechanisms underlying the WoL paradigm. Methods Materials and Methods In the following section, we provide a brief overview of the materials and methods used in this project; additional details can be found in the supplementary methods. 1 | Taxon Sampling, DNA Extraction, and Sequencing A comprehensive phylogenetic analysis of Pyrus was conducted, including all seven subsections defined by Phipps et al. (1990), the most detailed global taxonomic framework for pear species. Due to the self-incompatibility and hybrid formation of Pyrus, species like P. sinkiangensis were excluded. To accurately estimate divergence times, 41 outgroups from 26 genera in the apple tribe Maleae were selected, with a focus on close relatives, such as Malus and Sorbus sensu lato. Gillenia trifoliata from the tribe Gillenieae was used as an outgroup for rooting the Maleae phylogeny. Of the 92 samples analyzed, 58 came from Whole Genome Sequencing (WGS) and/or Deep Genome Skimming (DGS) data from the NCBI Sequence Read Archive (SRA), while 34 DGS (2 × 150bp) were generated in this study (see Table S1 for accession details).  DNA was extracted from silica-gel dried leaves and herbarium specimens using a modified CTAB method (mCTAB, Li et al., 2013). The quality of DNA was checked via agarose gel electrophoresis, and high-quality samples were sent to Novogene in Beijing, where libraries were prepared with the NEBNext^®^ Ultra^™^ II DNA Library Prep Kit and sequenced on the Illumina NovaSeq Platform (2 × 150bp). 2 | Reads Processing, Plastome Assembly, and Annotation Raw sequencing reads were processed using Trimmomatic v. 0.39 (Bolger et al., 2014) to remove the adapter and low-quality sequences. Quality control of the trimmed reads was performed with FastQC v. 0.11.9 (Andrews, 2010) to ensure the reliability and accuracy of the data for subsequent analyses.  For plastome assembly, we employed the Successive Approach combining Reference-based and De novo assembly (SARD; Liu et al., 2023), this method has been successfully utilized in various lineages of angiosperms (e.g., Liu et al., 2021, 2022, 2023; Jin et al., 2023, 2024; Wang et al., 2024). The assembled plastomes were annotated using the Plastid Genome Annotator (PGA) (Qu et al., 2019). Visualization of the chloroplast genomes was performed in Geneious Prime (Kearse et al., 2012) to verify the start and stop codons of each coding gene, and manual corrections were made where necessary. All assembled plastomes were submitted to GenBank, and their respective accessions are listed in Table S1. 3 | Single-copy Nuclear Marker Development and Sequence Assembly We used the well-designed 801 Single-Copy Nuclear (SCN) genes from previous Maleae studies (Jin et al., 2023, 2024). Nuclear loci were assembled using HybPiper v. 2.0.1 (Johnson et al., 2016), an integrated bioinformatics suite. Lineage-specific SCN sequences were extracted from NGS reads using default parameters. The ‘hybpiper assemble’ command mapped and sorted trimmed reads against SCN gene references with BWA v. 0.7.17 (Li and Durbin 2009) and SAMtools v. 1.17 (Li et al., 2009). Sorted reads were assembled into contigs using SPAdes v. 3.15.5 (Bankevich et al., 2012) with a coverage cutoff of 5. Gene recovery efficiency across species was summarized and visualized using ‘hybpiper stats’ and ‘hybpiper recovery_heatmap’ commands. The ‘hybpiper paralog_retriever’ command was used to detect and exclude paralogs or chimeric sequences that could affect orthology inference. 4 | Orthology Inference for the Nuclear Genes To accurately estimate phylogeny through orthologs, we applied the tree-based orthology inference method for SCN genes (Yang and Smith, 2014), generating two ortholog datasets: Monophyletic Outgroup (MO) and RooTed Ingroup (RT). Procedures are outlined in Morales-Briones et al. (2022). Uneven sequencing coverage led to outlier loci and short sequences, potentially affecting phylogenetic inference. To address this, our PhyloAI team developed a pipeline to refine these sequences, as detailed in the Supplementary Methods. 5 | Multiple Inference Methods for Phylogenetic Analyses This study used multiple phylogenetic methods to estimate the plastid/nuclear phylogeny of Pyrus, including concatenation- and coalescent-based approaches. For concatenated supermatrices derived from nuclear datasets, orthologs were combined using AMAS v. 1.0 (Borowiec, 2016). The optimal partitioning scheme and evolutionary models were determined with PartitionFinder2 (Stamatakis, 2006; Lanfear et al., 2017) using AICc and the rcluster algorithm. Maximum Likelihood (ML) inference was performed with IQ-TREE2 v. 2.1.3 (Minh et al., 2020), which included 1000 replicates for the SH approximate likelihood ratio test and ultrafast bootstrap, and RAxML v. 8.2.12 (Stamatakis, 2014), utilizing the GTRGAMMA model along with 200 bootstrap replicates.  The boundaries of two inverted repeats in each plastome were verified using Geneious Prime’s Repeat Finder plugin (Kearse et al., 2012), removing one duplicate. The 77 Plastid Coding Sequences (plastid CDSs) were extracted, aligned with MAFFT v. 7.475, and concatenated in AMAS v. 1.0. The resulting supermatrix underwent PartitionFinder2 for partitioning and nucleotide model selection, with the greedy algorithm applied. ML phylogenetic inference followed the nuclear analysis parameters.  A coalescent-based approach using ASTRAL-III (Zhang et al., 2018) was applied to the nuclear and plastid CDS datasets. Individual gene trees were estimated with RAxML v. 8.2.12 (Stamatakis, 2014) and refined by collapsing branches with bootstrap values below 10. The refined gene trees were then integrated into a species tree using ASTRAL-III. All nine resultant trees and related log files are available at the Dryad Digital Repository (https://doi.org/10.5061/dryad.3ffbg79r8) for further analysis. 6 | Detecting and Visualizing Nuclear Gene Tree Discordance We used several methods to evaluate gene tree congruence with the inferred phylogeny. First, we applied phyparts (Smith et al., 2015) to examine phylogenetic conflict by mapping gene trees to the target tree and quantifying discordant and congruent bipartitions. Both gene trees and the target tree were rooted using the ‘pxrr’ command in phyx (Brown et al., 2017), followed by a full concordance analysis (-a 1) in phyparts. Nodes with bootstrap support below 50% were excluded. A rapid concordant analysis (-a 0) was also conducted to mitigate the impact of missing taxa due to uneven nuclear gene recovery. The results were combined and visualized as pie charts, displaying the proportion of discordant and congruent topologies for each node. Additionally, we calculated the internode certainty all (ICA) value to summarize node inconsistency.  We also used Quartet Sampling (QS, Pease et al., 2018) to assess conflicting support at weakly supported nodes. QS, performed with 100 replicates and a log-likelihood threshold of 2, evaluates the reliability of internal tree relationships and terminal branches. 7 | SNP Calling and Gene Flow Analyses The latest high-quality genome assembly of Pyrus pyrifolia (GCA_016587475.1) was downloaded as the reference for single nucleotide polymorphism (SNP) calling. Clean reads from each sample were mapped to this reference using BWA v. 0.7.17 (Li and Durbin 2009), and the aligned results were processed into BAM files with SAMtools v. 1.6 (Li et al., 2009). Duplicate reads were marked, and variants were called with GATK (McKenna et al., 2010) using the ‘MarkDuplicates’ and ‘HaplotypeCaller’ functions. Haplotypes were combined with ‘CombineGVCFs’ and ‘GenotypeGVCFs’ to generate genotype files (gVCF). Two-step filtering was applied: initial filtering with GATK and secondary filtering with VCFTOOLS (Danecek et al., 2011) to select final variant sites.  We calculated the f4-ratio using Dsuite v. 0.5 (Malinsky et al., 2021) to explore gene flow between species. SNPs served as input data, with an ASTRAL-derived species tree as the guiding tree. Visualization was done using the ‘plot_f4ratio.rb’ Ruby script. Parallel analyses with different outgroups (Malus or Sorbus) showed high consistency in results regardless of the outgroup selection. 8 | Incomplete Lineage Sorting Analyses To investigate the role of incomplete lineage sorting (ILS) in shaping the evolutionary trajectory of Pyrus, we employed two approaches integrating the population mutation parameter theta (Cai et al., 2021) and coalescent simulation (Liu et al., 2022). In the first approach, theta was calculated by dividing the branch length in mutation units, inferred by IQ-TREE, by the length in coalescent units from ASTRAL-III. The ASTRAL-III tree was used as a fixed topology to ensure consistency between trees. Additionally, we examined the correlation between branch lengths and ICA values from ASTRAL-III to assess ILS’s impact; a strong positive correlation suggests ILS as a driver of tree conflicts (Zhou et al., 2022).  In the second approach, coalescent simulations were conducted using a dataset of 29 high-quality samples. Gene trees were extracted from these samples, and a species tree was inferred using ASTRAL-III. Phybase v. 1.5 (Liu and Yu, 2010) was used to simulate 10,000 gene trees under the multi-species coalescent model. The distances between simulated gene trees, empirical gene trees, and the species tree were calculated using DendroPy v. 4.5.2 (Sukumaran and Holder, 2010) and visually compared. The disparity in distance distributions was analyzed to assess ILS’s contribution to gene tree incongruence. 9 | Polyploidy Analyses We integrated multiple sources of evidence to explore the potential impact of polyploidy and whole-genome duplication (WGD) on phylogenetic discrepancies in Pyrus. First, a comprehensive literature review was conducted to gather chromosome-related data, including ploidy levels for all Pyrus species, primarily sourced from the International Plant Chromosome Number (IPCN) database (http://legacy.tropicos.org/Project/IPCN). Additionally, Smudgeplot (Ranallo-Benavidez et al., 2020) was used to infer ploidy by analyzing k-mers within sequencing reads, allowing for the identification of potential polyploid species.  To investigate the occurrence of polyploidization and WGD events in the deep phylogeny of Pyrus, we employed a methodology outlined by Morales-Briones et al. (2022). This involved extracting rooted ortholog trees from homolog trees, applying a filtering criterion of a minimum 50% bootstrap value per tree. Gene duplications identified in these trees were mapped onto a rooted species tree, with the proportions of gene duplications at nodes corresponding to the most recent common ancestor (MRCA) carefully documented. The computational scripts for this analysis are available at https://bitbucket.org/blackrim/clustering. 10 | Inference of Global Split Networks SplitsTree is an ideal tool for computing global split networks, particularly for deriving unrooted phylogenetic networks from molecular sequence data. Using methods such as split decomposition, neighbor-net, and consensus networks, we employed SplitsTree v. 4.19.0 (Huson and Bryant 2006) to investigate the complex evolutionary trajectory of Pyrus. Our primary dataset consisted of well-aligned SCN genes from the MO dataset, which included 50 Pyrus samples and a Malus outgroup. This investigation entailed an in-depth inference of the implicit network, employing parameters such as uncorrected_P distances, the EqualAngle network construction algorithm, and the NeighborNet method. 11 | Phylogenetic Network Analyses To explore the complex reticulate evolutionary processes in Pyrus, we used the Species Networks applying Quartets (SNaQ) algorithm within PhyloNetworks (Solís-Lemus et al., 2017). This software, built on maximum pseudolikelihood methods, efficiently infers phylogenetic networks from multi-locus datasets, particularly when scaling the number of taxa or hybridization events. We considered gene flow and ILS as potential sources of discordance in gene trees. To manage computational demands, three datasets were created, each with fewer than 20 samples: (1) “Maleae 15-taxa data,” testing hybridization origins with 15 species from genera like Malus and Sorbus, (2) “Pyrus 16-taxa data,” representing 15 P. subg. Pyrus species with a Malus outgroup, and (3) “Pashia 15-taxa data,” including 14 P. subg. Pashia taxa and a Malus outgroup. The selected species were either major lineages or known for cytonuclear discordance, ensuring a broad view of genetic diversity in Pyrus.  For each dataset, SCN gene trees were analyzed for quartet concordance factors (CFs) using the ‘readTrees2CF’ package. Species trees were reconstructed with ASTRAL-III (Zhang et al., 2018), and CFs were used to infer the optimal phylogenomic network. The maximum number of reticulation events (hmax) ranged from 0 to 6, with inheritance probabilities calculated for hybridization edges. The optimal network was identified by selecting the stable hmax value, where the pseudo-deviance score plateaued, indicating consistency across multiple runs. 12 | Dating Analysis and Ancestral Area Reconstruction In the context of the Maleae tribe, and particularly the genus Pyrus, there has been a notable absence of temporal dating analyses, resulting in an imprecise determination of the stem age of Pyrus. To address this gap, we adopted a two-step strategy for the divergence time estimation within Pyrus. Despite the discovery of various Pyrus fossils across different epochs and localities (Table S2), most of these fossils as leaf specimens present a limitation, as leaf morphology alone is insufficient for accurate species identification. Additionally, the generic classification of some leaf fossils remains ambiguous, posing challenges in distinguishing between Malus, Pyrus, or other related genera within the Maleae tribe. We first used the MCMCTree, a program implemented in PAML v. 4.9j (Yang, 2007), to estimate the divergence times across the Maleae phylogenetic backbone, incorporating two fossil species. The inferred age stem age of Pyrus was used in our subsequent analyses. In the following step, we employed BEAST2 (Bouckaert et al., 2014) to refine our estimation of divergence times among Pyrus species. This analysis utilized the stem age estimated in the first step of Pyrus as a secondary calibration point, thereby enhancing the precision of our temporal divergence estimates within the Pyrus genus. The detailed parameter settings for these two software programs can be found in the Supplementary Methods.  The biogeographic analysis of Pyrus was conducted using the BioGeoBEARS v. 1.1.1 (Matzke, 2018), integrated within RASP v. 4.2 (Yu et al., 2015). The time tree, inferred by PAML after excluding all outgroup taxa, was employed as the input tree for subsequent analyses. In this study, based on the distribution patterns of the extant Pyrus species and the paleotectonic histories of continents, we categorized the geographic areas into three regions: (A) East Asia, (B) Central and West Asia, and (C) Europe and Northern Africa. During the analysis, a constraint was imposed wherein the maximum number of areas assignable to any given phylogenetic node was restricted to two. The selection of the optimal biogeographic model was based on the highest Akaike Information Criterion corrected for small sample sizes (AICc_wt) value, following the comprehensive evaluation of all models available in the BioGeoBEARS toolkit. This methodological approach facilitated a rigorous and data-driven determination of the most plausible biogeographic scenario for the Pyrus genus.

传统的生命之树(Tree of Life, ToL)模型正日益受到生命之网(Web of Life, WoL)范式的挑战,后者能更精准地刻画生物类群的系统发育关系,尤其是在基因树与物种树间频繁出现的拓扑不一致性背景下。然而,目前仍缺乏标准化方法来解析不完全谱系分选(Incomplete Lineage Sorting, ILS)、杂交、基因渐渗、多倍化及全基因组加倍(whole-genome duplication, WGD)等演化机制,这仍是界定生命之网范式的重大障碍。梨属(Pyrus)以广泛的杂交事件为特征,是探究这些复杂演化过程的理想模型。本研究提出了逐步排除法(Step-by-Step Exclusion, SSE)以解析梨属的演化路径,研究结果表明:1)不完全谱系分选而非多倍化主导了梨属的起源;2)梨属的两个亚属遵循独立的演化路径,其分化受青藏高原隆升及中亚地区干旱化加剧形成的地理屏障所驱动;3)不完全谱系分选与杂交共同推动了川梨亚属(subg. Pashia)的多样化进程,而仅杂交事件塑造了梨亚属(subg. Pyrus)的网状演化格局;4)汉代丝绸之路的开通促进了梨亚属与川梨亚属间的遗传交流。逐步排除法为探究生命之网范式下的演化机制提供了通用框架。 ## 材料与方法 本节简要概述本研究使用的材料与方法,详细细节可参见补充方法。 1 | 类群采样、DNA提取与测序 本研究对梨属开展了全面的系统发育分析,涵盖了Phipps等人(1990)定义的全部7个亚组——该框架是目前最详尽的梨属全球分类学体系。鉴于梨属自交不亲和且易形成杂交种,新疆梨(P. sinkiangensis)等类群被排除在外。为精准估算分化时间,我们选取了苹果族(Maleae)26个属的41个外类群,重点关注苹果属(Malus)和广义花楸属(Sorbus sensu lato)等近缘类群;同时选用三裂叶吉列木(Gillenia trifoliata,隶属于吉列木族)作为外类群以锚定苹果族的系统发育树根。本研究共分析92份样本:其中58份的全基因组测序(Whole Genome Sequencing, WGS)和/或深度基因组浅层测序(Deep Genome Skimming, DGS)数据来自NCBI序列读取档案(Sequence Read Archive, SRA),剩余34份DGS(2×150bp)数据为本研究新增生成(样本登录信息详见表S1)。 我们采用改良CTAB法(mCTAB,Li等,2013)从硅胶干燥叶片及标本馆标本中提取DNA。通过琼脂糖凝胶电泳检测DNA质量,将合格的高质量样本送至北京诺禾致源(Novogene),使用NEBNext® Ultra™ II DNA文库制备试剂盒构建文库,并在Illumina NovaSeq平台上完成2×150bp双端测序。 2 | 读段处理、质体基因组组装与注释 原始测序读段使用Trimmomatic v.0.39(Bolger等,2014)进行处理,以去除接头序列及低质量读段。随后使用FastQC v.0.11.9(Andrews,2010)对修剪后的读段开展质量控制,确保后续分析的数据可靠性与准确性。 质体基因组组装采用结合参考基因组比对与从头组装的连续策略(Successive Approach combining Reference-based and De novo assembly, SARD;Liu等,2023),该方法已成功应用于多种被子植物类群的研究(如Liu等,2021、2022、2023;Jin等,2023、2024;Wang等,2024)。组装完成的质体基因组使用质体基因组注释工具(Plastid Genome Annotator, PGA;Qu等,2019)完成注释。随后在Geneious Prime(Kearse等,2012)中可视化叶绿体基因组,验证各编码基因的起始与终止密码子,并在必要时进行手动校正。所有组装完成的质体基因组已提交至GenBank,登录信息详见表S1。 3 | 单拷贝核标记开发与序列组装 本研究使用先前苹果族研究中开发的801个单拷贝核(Single-Copy Nuclear, SCN)基因(Jin等,2023、2024)。核基因座的组装使用集成生物信息学套件HybPiper v.2.0.1(Johnson等,2016)完成:以默认参数从二代测序读段中提取类群特异性单拷贝核序列;使用`hybpiper assemble`命令,通过BWA v.0.7.17(Li和Durbin,2009)与SAMtools v.1.17(Li等,2009)将修剪后的读段比对至单拷贝核基因参考序列并进行排序;随后使用SPAdes v.3.15.5(Bankevich等,2012)将排序后的读段组装为重叠群,设置覆盖度截断值为5。使用`hybpiper stats`与`hybpiper recovery_heatmap`命令汇总并可视化各物种的基因恢复效率;使用`hybpiper paralog_retriever`命令检测并排除可能影响直系同源推断的旁系同源或嵌合序列。 4 | 核基因的直系同源推断 为通过直系同源基因准确估算系统发育关系,我们采用基于树的方法对单拷贝核基因开展直系同源推断(Yang和Smith,2014),生成两类直系同源数据集:单系外类群(Monophyletic Outgroup, MO)数据集与根化内类群(RooTed Ingroup, RT)数据集,具体流程参照Morales-Briones等(2022)的方法。测序覆盖度不均可能导致异常基因座与短序列,进而影响系统发育推断。为解决该问题,本研究团队开发了一套序列优化流程,详细细节参见补充方法。 5 | 系统发育分析的多重推断方法 本研究采用多种系统发育方法估算梨属的质体与核系统发育关系,包括基于串联法与溯祖模型的分析策略。对于核数据集衍生的串联超级矩阵,使用AMAS v.1.0(Borowiec,2016)合并直系同源基因序列。使用PartitionFinder2(Stamatakis,2006;Lanfear等,2017)结合AICc信息准则与rcluster算法确定最优分区方案及演化模型。最大似然(Maximum Likelihood, ML)推断分别使用IQ-TREE2 v.2.1.3(Minh等,2020)与RAxML v.8.2.12(Stamatakis,2014)完成:前者采用1000次重复的SH近似似然比检验与超快速bootstrap分析,后者使用GTRGAMMA模型并设置200次bootstrap重复。 我们使用Geneious Prime的Repeat Finder插件验证每个质体基因组的两个反向重复序列边界,并移除其中一份重复序列。提取77个质体编码序列(plastid CDSs),使用MAFFT v.7.475进行序列比对,再通过AMAS v.1.0完成串联。生成的超级矩阵同样使用PartitionFinder2进行分区与核苷酸模型选择,采用贪心算法。质体数据集的ML系统发育推断参照核数据集的分析参数进行。 针对核与质体编码序列数据集,我们采用基于溯祖模型的ASTRAL-III方法(Zhang等,2018)分析:先使用RAxML v.8.2.12(Stamatakis,2014)估算单个基因树,并将bootstrap值低于10的分支折叠;随后将优化后的基因树整合为物种树。所有9棵最终生成的系统发育树及相关日志文件已上传至Dryad数字知识库(https://doi.org/10.5061/dryad.3ffbg79r8),可供后续分析使用。 6 | 核基因树不一致性的检测与可视化 本研究采用多种方法评估基因树与推断得到的系统发育树之间的一致性。首先,使用phyparts(Smith等,2015)分析系统发育冲突:将基因树映射至目标树,并量化不一致与一致的二分拓扑结构。使用phyx中的`pxrr`命令(Brown等,2017)对基因树与目标树完成根化,随后在phyparts中执行全一致性分析(-a 1),并排除bootstrap支持度低于50%的节点;同时开展快速一致性分析(-a 0)以缓解因核基因恢复不均导致的类群缺失带来的影响。将两种分析结果合并并以饼图可视化,展示每个节点处不一致与一致拓扑结构的占比。此外,我们计算了所有节点间的节点间确定性(ICA)值,以总结节点间的不一致程度。 我们还使用四分体抽样法(Quartet Sampling, QS;Pease等,2018)评估弱支持节点处的冲突支持情况:QS设置100次重复,对数似然阈值为2,用于评估内部系统发育关系与末端分支的可靠性。 7 | SNP Calling与基因流分析 本研究下载最新的高质量沙梨(Pyrus pyrifolia)基因组组装结果(GCA_016587475.1)作为参考基因组,用于单核苷酸多态性(single nucleotide polymorphism, SNP)调用。将每个样本的clean读段使用BWA v.0.7.17(Li和Durbin,2009)比对至参考基因组,再通过SAMtools v.1.6(Li等,2009)将比对结果转换为BAM文件。使用GATK(McKenna等,2010)的`MarkDuplicates`与`HaplotypeCaller`功能标记重复读段并调用变异;通过`CombineGVCFs`与`GenotypeGVCFs`命令合并单倍型,生成基因型文件(gVCF)。本研究采用两步过滤流程:先用GATK进行初始过滤,再通过VCFTOOLS(Danecek等,2011)进行二次过滤,以筛选最终的变异位点。 我们使用Dsuite v.0.5(Malinsky等,2021)计算f4-ratio值,以探究物种间的基因流情况:以SNP数据为输入,以ASTRAL衍生的物种树作为指导树,并使用`plot_f4ratio.rb` Ruby脚本完成可视化。采用不同外类群(苹果属或花楸属)的平行分析结果显示,无论选择何种外类群,分析结果均具有高度一致性。 8 | 不完全谱系分选分析 为探究不完全谱系分选在塑造梨属演化轨迹中的作用,我们采用两种方法:整合群体突变参数θ(Cai等,2021)与溯祖模拟(Liu等,2022)。第一种方法中,θ的计算方式为:将IQ-TREE估算的以突变单位计的分支长度,除以ASTRAL-III得到的以溯祖单位计的分支长度;我们固定使用ASTRAL-III构建的拓扑结构,以确保不同分析间的树拓扑一致性。此外,我们还分析了分支长度与ASTRAL-III得到的ICA值之间的相关性,以评估不完全谱系分选的影响:强正相关表明不完全谱系分选是导致基因树拓扑冲突的驱动因素(Zhou等,2022)。 第二种方法中,我们使用29份高质量样本的数据集开展溯祖模拟:从这些样本中提取基因树,并用ASTRAL-III推断物种树;使用Phybase v.1.5(Liu和Yu,2010)在多物种溯祖模型下模拟10000棵基因树;使用DendroPy v.4.5.2(Sukumaran和Holder,2010)计算模拟基因树、实测基因树与物种树之间的距离,并进行可视化比较;通过分析距离分布的差异,评估不完全谱系分选对基因树拓扑不一致的贡献程度。 9 | 多倍化分析 本研究整合多源证据,探究多倍化与全基因组加倍对梨属系统发育拓扑差异的潜在影响。首先,我们开展全面的文献综述,收集所有梨属物种的染色体倍性相关数据,主要数据来源为国际植物染色体数目(International Plant Chromosome Number, IPCN)数据库(http://legacy.tropicos.org/Project/IPCN)。此外,我们使用Smudgeplot(Ranallo-Benavidez等,2020)分析测序读段中的k-mer特征,以推断样本的倍性,识别潜在的多倍体物种。 为探究梨属深层系统发育中多倍化与全基因组加倍事件的发生情况,我们采用Morales-Briones等(2022)提出的分析方法:从同源基因树中提取直系同源树,设置每棵树的bootstrap值最低为50%作为过滤标准;将这些树中鉴定到的基因复制事件映射至根化物种树,并详细记录对应于最近共同祖先(Most Recent Common Ancestor, MRCA)节点的基因复制事件比例。本分析的计算脚本可从https://bitbucket.org/blackrim/clustering获取。 10 | 全局分裂网络推断 SplitsTree是计算全局分裂网络的理想工具,尤其适用于从分子序列数据中推导无根系统发育网络。本研究使用SplitsTree v.4.19.0(Huson和Bryant,2006),采用分裂分解、邻接网(neighbor-net)、共识网络等方法,解析梨属复杂的演化轨迹。本研究的核心数据集来自单系外类群数据集的高质量比对单拷贝核基因序列,包含50份梨属样本与1份苹果属外类群。我们采用未校正P距离、EqualAngle网络构建算法与NeighborNet方法,对隐含的系统发育网络开展深入推断。 11 | 系统发育网络分析 为探究梨属复杂的网状演化过程,我们使用PhyloNetworks工具包中的四分体应用物种网络(Species Networks applying Quartets, SNaQ)算法。该软件基于最大伪似然方法构建,可高效地从多位点数据集中推断系统发育网络,尤其适用于类群数量或杂交事件较多的场景。我们将基因流与不完全谱系分选视为基因树拓扑冲突的潜在来源。为控制计算成本,我们构建了三个样本量少于20的数据集:(1)“苹果族15类群数据集”:测试包含苹果属、花楸属等15个类群的杂交起源;(2)“梨属16类群数据集”:包含15个梨亚属物种与1份苹果属外类群;(3)“川梨亚属15类群数据集”:包含14个川梨亚属类群与1份苹果属外类群。所选类群均为主要演化支或已知存在核质冲突的类群,以全面覆盖梨属的遗传多样性。 对于每个数据集,我们使用`readTrees2CF`包分析单拷贝核基因树的四分体一致性因子(CFs);使用ASTRAL-III(Zhang等,2018)重建物种树,并基于四分体一致性因子推断最优的系统发育网络。我们将最大网状事件数(hmax)设置为0至6,计算杂交边缘的遗传概率。通过伪偏差得分趋于平稳时的稳定hmax值,确定最优的系统发育网络,该结果在多次重复运行中保持一致。 12 | 分化时间估算与祖先区域重建 在苹果族尤其是梨属的研究中,此前鲜有开展时间校准的分化分析,导致梨属的茎干年龄估算精度不足。为填补这一空白,本研究采用两步策略估算梨属的分化时间:尽管在不同地质年代与区域均发现了梨属化石(表S2),但多数叶部化石仅依靠叶片形态无法准确鉴定到物种水平,且部分叶部化石的属级分类仍存在歧义,难以区分苹果属、梨属或苹果族内的其他近缘属。我们首先使用PAML v.4.9j(Yang,2007)中的MCMCTree程序,结合2个化石物种估算苹果族系统发育主干的分化时间,并将推断得到的梨属茎干年龄用于后续分析;随后使用BEAST2(Bouckaert等,2014)细化梨属物种间的分化时间估算,以第一步得到的梨属茎干年龄作为二次校准点,提升梨属属内分化时间估算的精度。两款软件的详细参数设置可参见补充方法。 梨属的生物地理分析使用集成于RASP v.4.2(Yu等,2015)的BioGeoBEARS v.1.1.1(Matzke,2018)完成:将排除所有外类群后由PAML估算得到的时间树作为输入树。本研究基于现存梨属物种的分布格局与大陆古构造历史,将地理区域划分为三类:(A)东亚;(B)中亚与西亚;(C)欧洲与北非。分析过程中设置约束条件:每个系统发育节点可分配的最大区域数量为2。我们基于修正小样本量的赤池信息准则(AICc_wt)值最高的原则,从BioGeoBEARS工具包的所有模型中筛选最优生物地理模型,该方法可基于数据驱动的方式严谨确定梨属最合理的生物地理演化场景。
创建时间:
2025-08-29
二维码
社区交流群
二维码
科研交流群
商业服务