five

Data from: Dissecting signal and noise in diatom chloroplast protein encoding genes with phylogenetic information profiling

收藏
DataONE2016-07-01 更新2024-06-26 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈
官方服务:
资源简介:
Previous analyses of single diatom chloroplast protein-encoded genes recovered results highly incongruent with both traditional phylogenies and phylogenies derived from the nuclear encoded small subunit (SSU) gene. Our analysis here of six individual chloroplast genes (atpB, psaA, psaB, psbA, psbC and rbcL) obtained similar anomalous results. However, phylogenetic noise in these genes did not appear to be correlated, and their concatenation appeared to effectively sum their collective signal. We empirically demonstrated the value of combining phylogenetic information profiling, partitioned Bremer support and entropy analysis in examining the utility of various partitions in phylogenetic analysis. Noise was low in the 1st and 2nd codon positions, but so was signal. Conversely, high noise levels in the 3rd codon position was accompanied by high signal. Perhaps counterintuitively, simple exclusion experiments demonstrated this was especially true at deeper nodes where the 3rd codon position contributed most to a result congruent with morphology and SSU (and the total evidence tree here). Correlated with our empirical findings, probability of correct signal (derived from information profiling) increased and the statistical significance of substitutional saturation decreased as data were aggregated. In this regard, the aggregated 3rd codon position performed as well or better than more slowly evolving sites. Simply put, direct methods of noise removal (elimination of fast-evolving sites) disproportionately removed signal. Information profiling and partitioned Bremer support suggest that addition of chloroplast data will rapidly improve our understanding of the diatom phylogeny, but conversely also illustrate that some parts of the diatom tree are likely to remain recalcitrant to addition of molecular data. The methods based on information profiling have been criticized for their numerous assumptions and parameter estimates and the fact that they are based on quartets of taxa. Our empirical results support theoretical arguments that the simplifying assumptions made in these methods are robust to “real-life” situations.

过往针对单个硅藻叶绿体蛋白编码基因的分析,所得结果与传统系统发育树、以及基于核编码小亚基(SSU)基因构建的系统发育树均存在显著不一致。本研究针对6个独立叶绿体基因(atpB、psaA、psaB、psbA、psbC及rbcL)开展的分析,也得到了类似的异常结果。不过,这些基因的系统发育噪声并未呈现出相关性,而将它们串联后,似乎能有效整合所有基因的系统发育信号。本研究通过实证验证了:结合系统发育信息谱分析(phylogenetic information profiling)、分区Bremer支持度(partitioned Bremer support)分析与熵分析(entropy analysis),可用于评估不同数据分区在系统发育分析中的适用性。密码子第一、二位点的系统发育噪声水平较低,但其信号强度同样偏弱;与之相反,密码子第三位点的噪声水平较高,同时信号强度也更强。或许有悖直觉的是,简单排除实验表明,这一现象在系统发育深度较深的节点处尤为显著:此时密码子第三位点对得到与形态学数据、SSU基因(以及本研究的总证据树)一致的结果贡献最大。与本研究的实证结果相符的是,随着数据逐步聚合,由信息谱分析推导得到的正确信号概率会上升,而替换饱和(substitutional saturation)的统计学显著性则会下降。就此而言,聚合后的密码子第三位点的表现,与进化速率较慢的位点相当甚至更优。简言之,直接去除噪声的方法(即剔除进化速率较快的位点)会不成比例地移除系统发育信号。信息谱分析与分区Bremer支持度分析结果显示,补充叶绿体数据将快速推动我们对硅藻系统发育关系的认知,但同时也表明,硅藻系统发育树的部分分支可能仍难以通过补充分子数据实现解析。基于信息谱分析的方法曾因存在大量假设与参数估计,且仅基于四分类群(quartets of taxa)开展分析而受到批评。本研究的实证结果则验证了相关理论观点:即这些方法所采用的简化假设,在“真实世界”的研究场景中依然具备稳健性。
创建时间:
2016-07-01
二维码
社区交流群
二维码
科研交流群
商业服务