Data from: Conserved genes, sampling error, and phylogenomic inference
收藏DataONE2013-11-26 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
Disagreement or conflict among phylogenetic hypotheses obtained by analysis of large, genome-wide databases has incited debate over potential benefits, pitfalls, and best practices associated with phylogenomic approaches (Jeffroy, O., Brinkmann, H., et al. 2006, Philippe, H., Derelle, R., et al. 2009, Philippe, H., Brinkmann, H., et al. 2011). In a recent article, Salichos, L. and Rokas, A. (2013; S&R) assert that accuracy of phylogenetic inference from genomic data can be improved by focusing on the subset of genes that have “strong” phylogenetic signals as measured by bootstrap support of their inferred trees. In that study, S&R compared 23 yeast genomes and observed that genealogies obtained for 1070 orthologous genes were all different from each other and also differed from the topology obtained either by concatenating all genes or by an extended consensus phylogeny of all gene trees. They developed a new measure of incongruence (“internode certainty”) to gauge the level of conflict inherent in the data supporting specific internodes of the phylogeny. Based on this measure, S&R claim that slowly-evolving genes are a main source of conflict, suggesting that they should be avoided in favor of genes with strong phylogenetic signals. Their conclusion that strong signal reduces incongruence is drawn from the comparative phylogenetic analysis of protein alignments of the yeast genomes as well as from a reanalysis of published vertebrate and metazoan data. The notion that slowly-evolving genes are a bad choice to resolve basal nodes at deep phylogenetic levels is contrary to widespread practice in recent studies (e.g., Li, C., Orti, G., et al. 2007, Jian, S., Soltis, P.S., et al. 2008, Li, C., Lu, G., et al. 2008, Regier, J.C., Shultz, J.W., et al. 2008, Zhang, N., Zeng, L., et al. 2012, Lang, J.M., Darling, A.E., et al. 2013). We challenge S&R’s interpretations herein with new analyses of their yeast data. We first demonstrate that the high phylogenetic incongruence among conserved genes observed by S&R is likely an artifact due to sampling error. Secondly, we challenge their premise that bootstrap support is a reliable measure of historical signal of genes as it excludes systematic error as an alternative explanation of observed pattern, which has previously been shown to have consequences for phylogenetic analyses of yeast genomes (Collins, T.M., Fedrigo, O., et al. 2005). Finally, we note that S&R’s recommendations make the task of choosing genes upon which to base phylogenetic inferences impossible, as measures of phylogenetic signal can only be determined after data have been collected. Recommendations to focus on selective data partitions assessed by phylogenetic analysis may become relevant in the future, however, once complete genomes are available for all species of interest.
基于全基因组大型数据库分析得到的系统发育假说之间存在分歧或冲突,引发了学界关于系统基因组学(phylogenomic)方法潜在优势、局限与最佳实践的争论(Jeffroy, O., Brinkmann, H., 等. 2006;Philippe, H., Derelle, R., 等. 2009;Philippe, H., Brinkmann, H., 等. 2011)。近期一篇论文中,Salichos, L.与Rokas, A.(2013;下称S&R)提出,若聚焦于那些通过自举支持度(bootstrap support)衡量、具备“强”系统发育信号的基因子集,可提升基于基因组数据的系统发育推断准确性。在该项研究中,S&R比对了23个酵母基因组,发现1070个直系同源基因(orthologous genes)对应的基因谱系彼此各不相同,且既不同于通过合并所有基因得到的拓扑结构,也不同于所有基因树的扩展一致性系统发育拓扑。他们开发了一种新的不一致性度量指标——“节间确定性(internode certainty)”,用以评估支撑系统发育特定节间的数据中固有的冲突水平。基于该指标,S&R声称进化缓慢的基因是冲突的主要来源,建议应规避这类基因,转而选择具备强系统发育信号的基因。他们“强信号可降低不一致性”的结论,既来自对酵母基因组蛋白质比对序列的比较系统发育分析,也来自对已发表的脊椎动物与后生动物(metazoan)数据的重新分析。认为进化缓慢的基因不适用于解析深度系统发育层级的基部节点这一观点,与近期多项研究中的主流实践相悖(例如:Li, C., Orti, G., 等. 2007;Jian, S., Soltis, P.S., 等. 2008;Li, C., Lu, G., 等. 2008;Regier, J.C., Shultz, J.W., 等. 2008;Zhang, N., Zeng, L., 等. 2012;Lang, J.M., Darling, A.E., 等. 2013)。本文中我们通过重新分析S&R的酵母数据,对其解读提出质疑:首先,我们证明S&R所观测到的保守基因间较高的系统发育不一致性,大概率是由抽样误差导致的假象;其次,我们质疑其“自举支持度是衡量基因历史信号的可靠指标”这一前提,因为该指标未将系统误差作为观测模式的替代性解释,而此前已有研究表明,这一点对酵母基因组的系统发育分析存在影响(Collins, T.M., Fedrigo, O., 等. 2005);最后,我们指出S&R的建议会使得选择用于系统发育推断的基因这一任务变得不可能,因为系统发育信号的度量只能在数据收集完成后才能确定。不过,当所有目标物种的完整基因组均已测序时,基于系统发育分析评估选择性数据分区的建议,或许会在未来具备实际意义。
创建时间:
2013-11-26



