five

Data from: Analyzing contentious relationships and outlier genes in phylogenomics

收藏
DataONE2018-06-05 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Recent studies have demonstrated that conflict is common among gene trees in phylogenomic studies, and that less than one percent of genes may ultimately drive species tree inference in supermatrix analyses. Here, we examined two datasets where supermatrix and coalescent-based species trees conflict. We identified two highly influential “outlier” genes in each dataset. When removed from each dataset, the inferred supermatrix trees matched the topologies obtained from coalescent analyses. We also demonstrate that, while the outlier genes in the vertebrate dataset have been shown in a previous study to be the result of errors in orthology detection, the outlier genes from a plant dataset did not exhibit any obvious systematic error and therefore may be the result of some biological process yet to be determined. While topological comparisons among a small set of alternate topologies can be helpful in discovering outlier genes, they can be limited in several ways, such as assuming all genes share the same topology. Coalescent species tree methods relax this assumption but do not explicitly facilitate the examination of specific edges. Coalescent methods often also assume that conflict is the result of incomplete lineage sorting (ILS). Here we explored a framework that allows for quickly examining alternative edges and support for large phylogenomic datasets that does not assume a single topology for all genes. For both datasets, these analyses provided detailed results confirming the support for coalescent-based topologies. This framework suggests that we can improve our understanding of the underlying signal in phylogenomic datasets by asking more targeted edge-based questions.

近期研究表明,系统基因组学(phylogenomics)研究中基因树(gene trees)间的冲突现象极为普遍,且在超矩阵(supermatrix)分析中,最终支撑物种树(species tree)推断的基因占比可能不足1%。本研究针对两组存在超矩阵分析所得物种树与基于溯祖(coalescent)的物种树结果冲突的数据集展开探究。我们在每组数据集中均鉴定出两个极具影响力的“异常基因(outlier genes)”。将这两个异常基因从对应数据集中移除后,推断得到的超矩阵树拓扑结构与溯祖分析得到的拓扑结构完全一致。我们还证实:尽管此前研究已发现脊椎动物数据集内的异常基因源于直系同源(orthology)检测失误,但植物数据集的异常基因未表现出任何明显的系统性误差,因此其成因可能来自尚未被探明的某种生物学过程。尽管针对少量备选拓扑结构的拓扑比较有助于发现异常基因,但这类方法存在多方面局限,例如默认所有基因共享同一拓扑结构。基于溯祖的物种树方法放宽了这一假设,但无法直接支持对特定分支(edges)的检测。此外,溯祖方法通常默认冲突源于不完全谱系分选(incomplete lineage sorting, ILS)。本研究探索了一种可快速检测大型系统基因组数据集备选分支及其支持信号的分析框架,该框架无需假设所有基因共享同一拓扑结构。针对两组数据集,该分析均得到了详细结果,证实了基于溯祖的物种树拓扑结构所获得的支持度。该框架表明,通过提出更具针对性的分支导向问题,我们能够更深入地理解系统基因组数据集中蕴含的内在信号。
创建时间:
2018-06-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作