five

Data from: Conserved genes, sampling error, and phylogenomic inference

收藏
DataONE2013-11-26 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Disagreement or conflict among phylogenetic hypotheses obtained by analysis of large, genome-wide databases has incited debate over potential benefits, pitfalls, and best practices associated with phylogenomic approaches (Jeffroy, O., Brinkmann, H., et al. 2006, Philippe, H., Derelle, R., et al. 2009, Philippe, H., Brinkmann, H., et al. 2011). In a recent article, Salichos, L. and Rokas, A. (2013; S&R) assert that accuracy of phylogenetic inference from genomic data can be improved by focusing on the subset of genes that have “strong” phylogenetic signals as measured by bootstrap support of their inferred trees. In that study, S&R compared 23 yeast genomes and observed that genealogies obtained for 1070 orthologous genes were all different from each other and also differed from the topology obtained either by concatenating all genes or by an extended consensus phylogeny of all gene trees. They developed a new measure of incongruence (“internode certainty”) to gauge the level of conflict inherent in the data supporting specific internodes of the phylogeny. Based on this measure, S&R claim that slowly-evolving genes are a main source of conflict, suggesting that they should be avoided in favor of genes with strong phylogenetic signals. Their conclusion that strong signal reduces incongruence is drawn from the comparative phylogenetic analysis of protein alignments of the yeast genomes as well as from a reanalysis of published vertebrate and metazoan data. The notion that slowly-evolving genes are a bad choice to resolve basal nodes at deep phylogenetic levels is contrary to widespread practice in recent studies (e.g., Li, C., Orti, G., et al. 2007, Jian, S., Soltis, P.S., et al. 2008, Li, C., Lu, G., et al. 2008, Regier, J.C., Shultz, J.W., et al. 2008, Zhang, N., Zeng, L., et al. 2012, Lang, J.M., Darling, A.E., et al. 2013). We challenge S&R’s interpretations herein with new analyses of their yeast data. We first demonstrate that the high phylogenetic incongruence among conserved genes observed by S&R is likely an artifact due to sampling error. Secondly, we challenge their premise that bootstrap support is a reliable measure of historical signal of genes as it excludes systematic error as an alternative explanation of observed pattern, which has previously been shown to have consequences for phylogenetic analyses of yeast genomes (Collins, T.M., Fedrigo, O., et al. 2005). Finally, we note that S&R’s recommendations make the task of choosing genes upon which to base phylogenetic inferences impossible, as measures of phylogenetic signal can only be determined after data have been collected. Recommendations to focus on selective data partitions assessed by phylogenetic analysis may become relevant in the future, however, once complete genomes are available for all species of interest.

对大型全基因组数据库进行分析所得到的各类系统发育假说之间存在的分歧与冲突,引发了学界围绕系统发育组学方法(phylogenomic approaches)潜在优势、局限与最佳实践的争论(Jeffroy, O., Brinkmann, H., 等. 2006; Philippe, H., Derelle, R., 等. 2009; Philippe, H., Brinkmann, H., 等. 2011)。在近期的一篇论文中,Salichos, L.与Rokas, A.(2013;下称S&R)提出,通过聚焦于具备"强"系统发育信号的基因子集——其信号强度可通过对应推断树的自展支持率(bootstrap support)衡量——可提升基于基因组数据的系统发育推断准确性。在该项研究中,S&R比对了23个酵母基因组,发现1070个直系同源基因(orthologous genes)所对应的基因树彼此之间均存在差异,且均不同于通过串联所有基因得到的拓扑结构,或是通过所有基因树构建的扩展一致性系统发育树的拓扑结构。他们开发了一种新的不一致性衡量指标:节点间确定性(internode certainty),用以评估支撑系统发育特定节点的数据集所固有的冲突水平。基于该指标,S&R宣称演化速率缓慢的基因是冲突的主要来源,建议应优先选择具备强系统发育信号的基因,而非慢演化基因。他们关于"强信号可降低不一致性"的结论,既来自对酵母基因组蛋白质比对序列的比较系统发育分析,也来自对已发表的脊椎动物与后生动物数据集的重新分析。"慢演化基因并非解决深层系统发育水平基部节点的最优选择"这一观点,与近期多项研究中的主流实践相悖(例如:Li, C., Orti, G., 等. 2007; Jian, S., Soltis, P.S., 等. 2008; Li, C., Lu, G., 等. 2008; Regier, J.C., Shultz, J.W., 等. 2008; Zhang, N., Zeng, L., 等. 2012; Lang, J.M., Darling, A.E., 等. 2013)。本文中,我们通过对S&R的酵母数据集开展新的分析,对其研究解读提出质疑。首先,我们证明S&R所观测到的保守基因间较高的系统发育不一致性,大概率是抽样误差导致的假象。其次,我们质疑其"自展支持率是衡量基因历史信号的可靠指标"这一前提,因为该指标将系统误差排除在了观测模式的备选解释之外——而此前已有研究证实,系统误差会对酵母基因组的系统发育分析造成影响(Collins, T.M., Fedrigo, O., 等. 2005)。最后,我们指出S&R的建议实际上使得"选择作为系统发育推断依据的基因"这一任务变得无法完成,因为系统发育信号的衡量仅能在数据收集完成后方可进行。不过,当所有目标物种的完整基因组均已获取时,"聚焦于通过系统发育分析评估的选择性数据分区"这一建议或许会在未来具备实际应用价值。
创建时间:
2013-11-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作