Data from: Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants
收藏DataONE2015-07-01 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Background: The use of transcriptomic and genomic datasets for phylogenetic reconstruction has become increasingly common as researchers attempt to resolve recalcitrant nodes with increasing amounts of data. The large size and complexity of these datasets introduce significant phylogenetic noise and conflict into subsequent analyses. The sources of conflict may include hybridization, incomplete lineage sorting, or horizontal gene transfer, and may vary across the phylogeny. For phylogenetic analysis, this noise and conflict has been accommodated in one of several ways: by binning gene regions into subsets to isolate consistent phylogenetic signal; by using gene-tree methods for reconstruction, where conflict is presumed to be explained by incomplete lineage sorting (ILS); or through concatenation, where noise is presumed to be the dominant source of conflict. The results provided herein emphasize that analysis of individual homologous gene regions can greatly improve our understanding of the underlying conflict within these datasets. Results: Here we examined two published transcriptomic datasets, the angiosperm group Caryophyllales and the aculeate Hymenoptera, for the presence of conflict, concordance, and gene duplications in individual homologs across the phylogeny. We found significant conflict throughout the phylogeny in both datasets and in particular along the backbone. While some nodes in each phylogeny showed patterns of conflict similar to what might be expected with ILS alone, the backbone nodes also exhibited low levels of phylogenetic signal. In addition, certain nodes, especially in the Caryophyllales, had highly elevated levels of strongly supported conflict that cannot be explained by ILS alone. Conclusion: This study demonstrates that phylogenetic signal is highly variable in phylogenomic data sampled across related species and poses challenges when conducting species tree analyses on large genomic and transcriptomic datasets. Further insight into the conflict and processes underlying these complex datasets is necessary to improve and develop adequate models for sequence analysis and downstream applications. To aid this effort, we developed the open source software phyparts (https://bitbucket.org/blackrim/phyparts), which calculates unique, conflicting, and concordant bipartitions, maps gene duplications, and outputs summary statistics such as internode certainy (ICA) scores and node-specific counts of gene duplications.
背景:随着研究者试图借助日益扩增的数据集解决系统发育研究中的疑难节点,利用转录组学、基因组学数据集开展系统发育重建的研究愈发普遍。此类数据集规模庞大且结构复杂,会在后续分析中引入显著的系统发育噪声与冲突信号。冲突的来源可能包括杂交、不完全谱系分选(incomplete lineage sorting, ILS)以及水平基因转移,且其在系统发育树上存在谱系特异性差异。针对系统发育分析,学界已提出多种策略以应对此类噪声与冲突:一是将基因区域划分为子集,以分离出一致的系统发育信号;二是采用基因树方法开展重建,此时默认冲突由不完全谱系分选(ILS)所致;三是使用串联法,此时默认噪声为冲突的主要来源。本研究结果表明,对单个同源基因区域的分析,可极大提升我们对这类数据集内部潜在冲突的认知。
结果:本研究针对两项已发表的转录组数据集——被子植物石竹目(Caryophyllales)类群与膜翅目针尾部(aculeate Hymenoptera)类群——展开分析,旨在探究系统发育树上各单个同源基因的冲突、一致性以及基因重复情况。研究发现,两类数据集的系统发育树整体均存在显著冲突,尤其是在系统发育主干节点处。尽管两类系统发育树的部分节点所呈现的冲突模式,与仅由不完全谱系分选(ILS)导致的预期结果相符,但主干节点同样表现出较低的系统发育信号强度。此外,部分节点(尤以石竹目类群为甚)的强支持冲突水平显著升高,这类冲突无法仅通过不完全谱系分选(ILS)解释。
结论:本研究表明,从近缘物种中获取的系统发育组学数据,其系统发育信号存在高度异质性,这为基于大型基因组与转录组数据集开展物种树分析带来了挑战。为优化并开发适用于序列分析及下游应用的合适模型,我们需要进一步解析这类复杂数据集背后的冲突机制与相关演化过程。为助力相关研究,我们开发了开源软件phyparts(https://bitbucket.org/blackrim/phyparts),该工具可计算独特、冲突及一致的二分分区信息,映射基因重复事件,并输出汇总统计量,如节点间确定性(internode certainty, ICA)得分以及节点特异性基因重复计数。
创建时间:
2015-07-01



