Data from: Targeted enrichment of large gene families for phylogenetic inference: phylogeny and molecular evolution of photosynthesis genes in the Portullugo clade (Caryophyllales)
收藏DataONE2017-09-19 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Hybrid enrichment is an increasingly popular approach for obtaining hundreds of loci for phylogenetic analysis across many taxa quickly and cheaply. The genes targeted for sequencing are typically single-copy loci, which facilitate a more straightforward sequence assembly and homology assignment process. However, this approach limits the inclusion of most genes of functional interest, which often belong to multi-gene families. Here we demonstrate the feasibility of including large gene families in hybrid enrichment protocols for phylogeny reconstruction and subsequent analyses of molecular evolution, using a new set of bait sequences designed for the “portullugo” (Caryophyllales), a moderately sized lineage of flowering plants (∼2200 species) that includes the cacti and harbors many evolutionary transitions to C4 and CAM photosynthesis. Including multi-gene families allowed us to simultaneously infer a robust phylogeny and construct a dense sampling of sequences for a major enzyme of C4 and CAM photosynthesis, which revealed the accumulation of adaptive amino acid substitutions associated with C4 and CAM origins in particular paralogs. Our final set of matrices for phylogenetic analyses included 75–218 loci across 74 taxa, with ∼50% matrix completeness across datasets. Phylogenetic resolution was greatly improved across the tree, at both shallow and deep levels. Concatenation and coalescent-based approaches both resolve the sister lineage of the cacti with strong support: Anacampserotaceae + Portulacaceae, two lineages of mostly diminutive succulent herbs of warm, arid regions. In spite of this congruence, BUCKy concordance analyses demonstrated strong and conflicting signals across gene trees. Our results add to the growing number of examples illustrating the complexity of phylogenetic signals in genomic-scale data.
杂交富集(hybrid enrichment)是一种日益流行的研究手段,可快速且低成本地从众多类群(taxon,复数taxa)中获取数百个基因座(locus,复数loci)用于系统发育分析。靶向测序的基因通常为单拷贝基因座,这能简化序列组装与同源性赋值流程。但该方法存在局限性,难以纳入多数具有功能研究价值的基因,而这类基因往往隶属于多基因家族。本研究针对“波图卢戈类群”(portullugo,石竹目Caryophyllales)设计了一套全新的诱饵序列(bait sequence),以此验证在杂交富集流程中纳入大型多基因家族的可行性,可用于系统发育重建与后续分子演化分析。该开花植物支系规模中等(约2200个物种),涵盖仙人掌类群,且多次发生向C4光合途径(C4)与景天酸代谢(CAM,Crassulacean Acid Metabolism)途径的演化转换。纳入多基因家族后,我们可同时构建可靠的系统发育树,并对C4与CAM光合途径的关键酶基因开展高密度序列采样,结果揭示了特定旁系同源基因(paralog)中与C4和CAM起源相关的适应性氨基酸替换积累现象。本研究用于系统发育分析的最终矩阵集涵盖74个类群的75至218个基因座,各数据集的矩阵完整性约为50%。系统发育树的分辨率在浅源与深源分支上均得到显著提升。串联法(concatenation)与基于溯祖理论(coalescent)的分析方法均得到了具有强支持度的仙人掌科姊妹支系:回欢草科(Anacampserotaceae)+马齿苋科(Portulacaceae),这两个支系多为分布于温暖干旱区域的小型肉质草本植物。尽管存在上述分析结果的一致性,BUCKy一致性分析却揭示了不同基因树间存在强烈且相互冲突的系统发育信号。本研究结果进一步丰富了相关研究案例,表明基因组规模数据中的系统发育信号具有复杂特性。
创建时间:
2017-09-19



