Data from: Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants
收藏DataONE2015-07-01 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Background: The use of transcriptomic and genomic datasets for phylogenetic reconstruction has become increasingly common as researchers attempt to resolve recalcitrant nodes with increasing amounts of data. The large size and complexity of these datasets introduce significant phylogenetic noise and conflict into subsequent analyses. The sources of conflict may include hybridization, incomplete lineage sorting, or horizontal gene transfer, and may vary across the phylogeny. For phylogenetic analysis, this noise and conflict has been accommodated in one of several ways: by binning gene regions into subsets to isolate consistent phylogenetic signal; by using gene-tree methods for reconstruction, where conflict is presumed to be explained by incomplete lineage sorting (ILS); or through concatenation, where noise is presumed to be the dominant source of conflict. The results provided herein emphasize that analysis of individual homologous gene regions can greatly improve our understanding of the underlying conflict within these datasets. Results: Here we examined two published transcriptomic datasets, the angiosperm group Caryophyllales and the aculeate Hymenoptera, for the presence of conflict, concordance, and gene duplications in individual homologs across the phylogeny. We found significant conflict throughout the phylogeny in both datasets and in particular along the backbone. While some nodes in each phylogeny showed patterns of conflict similar to what might be expected with ILS alone, the backbone nodes also exhibited low levels of phylogenetic signal. In addition, certain nodes, especially in the Caryophyllales, had highly elevated levels of strongly supported conflict that cannot be explained by ILS alone. Conclusion: This study demonstrates that phylogenetic signal is highly variable in phylogenomic data sampled across related species and poses challenges when conducting species tree analyses on large genomic and transcriptomic datasets. Further insight into the conflict and processes underlying these complex datasets is necessary to improve and develop adequate models for sequence analysis and downstream applications. To aid this effort, we developed the open source software phyparts (https://bitbucket.org/blackrim/phyparts), which calculates unique, conflicting, and concordant bipartitions, maps gene duplications, and outputs summary statistics such as internode certainy (ICA) scores and node-specific counts of gene duplications.
背景:随着研究人员借助日益增多的数据尝试解析系统发育中的疑难节点(recalcitrant nodes),转录组与基因组数据集用于系统发育重建(phylogenetic reconstruction)的应用愈发普遍。这类数据集的庞大规模与复杂结构,会给后续分析引入显著的系统发育噪声(phylogenetic noise)与冲突信号。冲突的来源可能涵盖杂交、不完全谱系分选(incomplete lineage sorting, ILS)或水平基因转移(horizontal gene transfer),且在整个系统发育树的不同分支间存在差异。针对系统发育分析,目前已有多种策略应对此类噪声与冲突:一是将基因区域划分为子集以分离一致性系统发育信号;二是采用基因树(gene-tree)重建方法,该方法默认冲突由不完全谱系分选(ILS)导致;三是通过串联分析(concatenation)策略,默认噪声为冲突的主要来源。本文提供的研究结果表明,对单个同源基因区域(homologous gene regions)开展分析,可极大增进我们对这类数据集内部潜在冲突的认知。
结果:本研究针对两份已发表的转录组数据集——被子植物石竹目(Caryophyllales)与膜翅目针尾部(aculeate Hymenoptera)——分析了其全系统发育树范围内单个同源基因的冲突、一致性及基因重复(gene duplications)事件情况。我们在两份数据集的全系统发育树中均发现了显著的冲突信号,且在系统发育主干分支上尤为突出。尽管两份系统发育树中的部分节点呈现出仅由ILS即可解释的冲突模式,但主干节点同时也表现出较低的系统发育信号强度。此外,部分节点(尤其是石竹目中的节点)存在大量高支持度的冲突信号,这类冲突无法仅通过ILS加以解释。
结论:本研究表明,基于近缘类群采样的系统发育组数据(phylogenomic data)中,系统发育信号存在高度异质性,这为基于大型基因组与转录组数据集开展物种树分析(species tree analyses)带来了挑战。要优化并开发适用于序列分析及下游应用的合理模型,我们需要进一步解析这类复杂数据集背后的冲突成因与相关演化过程。为助力相关研究推进,我们开发了开源软件phyparts(访问地址:https://bitbucket.org/blackrim/phyparts),该软件可计算唯一、冲突及一致性的二分分裂(bipartitions)信息,标记基因重复事件,并输出诸如分支间置信度得分(internode certainy, ICA)及节点特异性基因重复计数等汇总统计量。
创建时间:
2015-07-01



