Data from: Concatenated alignments and the case of the disappearing tree
收藏DataONE2015-01-03 更新2024-06-27 收录
下载链接:
https://search.dataone.org/view/null
下载链接
链接失效反馈官方服务:
资源简介:
BackgroundAnalyzed individually, gene trees for a given taxon set tend to harbour incongruent or conflicting signals. One popular approach to deal with this circumstance is to use concatenated data. But especially in prokaryotes, where lateral gene transfer (LGT) is a natural mechanism of generating genetic diversity, there are open questions as to whether concatenation amplifies or averages phylogenetic signals residing in individual genes. Here we investigate concatenations of prokaryotic and eukaryotic datasets to investigate possible sources of incongruence in phylogenetic trees and to examine the level of overlap between individual and concatenated alignments.ResultsWe analyzed prokaryotic datasets comprising 248 invidual gene trees from 315 genomes at three taxonomic depths spanning gammaproteobacteria, proteobacteria, and prokaryotes (bacteria plus archaea), and eukaryotic datasets comprising 279 invidual gene trees from 85 genomes at two taxonomic depths: across plants-animals-fungi and within fungi. Consistent with previous findings, the branches in trees made from concatenated alignments are, in general, not supported by any of their underlying individual gene trees, even though the concatenation trees tend to possess high bootstrap proportions values. For the prokaryote data, this observation is independent of phylogenetic depth and sequence conservation. The eukaryotic data show much better agreement between concatenation and single gene trees. LGT frequencies in trees were estimated using established methods. Sequence length in individual alignments, but not sequence divergence, was found to correlate with the generation of branches that correspond to the concatenated tree.ConclusionsThe weak correspondence of concatenation trees with single gene trees gives rise to the question where the phylogenetic signal in concatenated trees is coming from. The eukaryote data reveals a better correspondence between individual and concatenation trees than the prokaryote data. The question of whether the lack of correspondence between individual genes and the concatenation tree in the prokaryotic data is due to LGT or phylogenetic artefacts is remains unanswered. If LGT is the cause of incongruence between concatenation and individual trees, we would have expected to see greater degrees of incongruence for more divergent prokaryotic data sets, which was not observed, although estimated rates of LGT suggest that LGT is responsible for at least some of the observed incongruence.
背景
针对给定分类群集合单独分析时,各基因树往往会呈现不一致或冲突的信号。解决该问题的一种常用手段是采用串联数据。但在原核生物中尤为突出——侧向基因转移(lateral gene transfer, LGT)是其产生遗传多样性的天然机制,目前仍存在悬而未决的问题:串联分析究竟是放大还是平均化了单个基因所携带的系统发育信号。本研究针对原核生物与真核生物数据集开展串联分析,旨在探究系统发育树不一致性的潜在来源,并检验单个比对序列与串联比对序列之间的重叠程度。
结果
我们分析了两类数据集:其一为原核生物数据集,包含来自315个基因组的248个单基因树,涵盖3个分类学层级,分别为γ-变形菌纲、变形菌门以及原核生物界(细菌+古菌);其二为真核生物数据集,包含来自85个基因组的279个单基因树,涵盖2个分类学层级:植物-动物-真菌界间以及真菌界内部。
与既往研究结果一致,由串联比对序列构建的系统发育树,其分支通常未得到任何对应单基因树的支持,即便串联树往往具备较高的自举百分比(bootstrap proportions)支持值。对于原核生物数据集,这一现象与分类学深度和序列保守性均无关联。真核生物数据集的串联树与单基因树之间的一致性则显著更佳。
本研究采用已成熟的方法估算了各树中的侧向基因转移频率。研究发现,单个比对序列的长度(而非序列分化程度)与串联树分支的重现存在显著相关性。
结论
串联树与单基因树之间仅存在微弱的一致性,这引发了一个核心问题:串联树中的系统发育信号究竟源自何处?相较于原核生物数据集,真核生物数据集的单基因树与串联树之间的一致性更优。原核生物数据集中单基因树与串联树之间缺乏一致性,其原因究竟是侧向基因转移还是系统发育假象,目前仍未有定论。若侧向基因转移是导致串联树与单基因树不一致的诱因,我们本应在分化程度更高的原核生物数据集中观察到更严重的不一致性,但本研究并未发现这一现象;尽管估算得到的侧向基因转移速率表明,LGT至少是部分观测到的不一致性的成因。
创建时间:
2015-01-03



