Data from: Selecting question-specific genes to reduce incongruence in phylogenomics: a case study of jawed vertebrate backbone phylogeny

DataONE2015-08-17 更新2024-06-27 收录

下载链接：

https://search.dataone.org/view/null

下载链接

链接失效反馈

官方服务：

资源简介：

Incongruence between different phylogenomic analyses is the main challenge faced by phylogeneticists in the genomic era. To reduce incongruence, phylogenomic studies normally adopt some data filtering approaches, such as reducing missing data or using slowly evolving genes, to improve the signal quality of data. Here, we assembled a phylogenomic data set of 58 jawed vertebrate taxa and 4682 genes to investigate the backbone phylogeny of jawed vertebrates under both concatenation and coalescent-based frameworks. To evaluate the efficiency of extracting phylogenetic signals among different data filtering methods, we chose six highly intractable internodes within the backbone phylogeny of jawed vertebrates as our test questions. We found that our phylogenomic data set exhibits substantial conflicting signal among genes for these questions. Our analyses showed that non-specific data sets that are generated without bias toward specific questions are not sufficient to produce consistent results when there are several difficult nodes within a phylogeny. Moreover, phylogenetic accuracy based on non-specific data is considerably influenced by the size of data and the choice of tree inference methods. To address such incongruences, we selected genes that resolve a given internode but not the entire phylogeny. Notably, not only can this strategy yield correct relationships for the question, but it also reduces inconsistency associated with data sizes and inference methods. Our study highlights the importance of gene selection in phylogenomic analyses, suggesting that simply using a large amount of data cannot guarantee correct results. Constructing question-specific data sets may be more powerful for resolving problematic nodes.

基因组时代下，不同系统基因组学分析之间的不一致性，是系统发育学家面临的核心挑战。为缓解这一问题，系统基因组学研究通常会采用诸如减少缺失数据、选用进化速率较慢的基因等数据过滤手段，以提升数据的信号质量。本研究构建了一套包含58个有颌脊椎动物类群与4682个基因的系统基因组数据集（phylogenomic data set），旨在基于串联法（concatenation）与基于溯祖的（coalescent-based）两种推断框架，探究有颌脊椎动物的主干系统发育关系。为评估不同数据过滤方法提取系统发育信号的效能，我们选取了有颌脊椎动物主干系统发育中6个极难解析的内部分支节点作为测试对象。研究发现，针对这些测试对象，本数据集在基因层面存在显著的信号冲突。分析结果表明，当系统发育树中存在多个疑难节点时，未针对特定研究问题设置偏倚的非特异性数据集，无法产生一致的分析结果。此外，基于非特异性数据集的系统发育准确性，会显著受数据量与树推断方法选择的影响。为解决这类不一致性问题，我们筛选出了能够解析特定内部分支节点、而非覆盖完整系统发育的基因集。值得注意的是，该策略不仅能够得到对应测试问题的正确亲缘关系，还能降低因数据量与推断方法选择带来的分析不一致性。本研究强调了基因筛选在系统基因组学分析中的重要性，指出单纯依赖大规模数据集并不能保证得到正确的分析结果，而构建针对特定研究问题的数据集，或许更有助于解析疑难系统发育节点。

创建时间：

2015-08-17