Conflicting phylogenetic signals in genomic data of the coffee family (Rubiaceae)

Mendeley Data2024-04-12 更新2024-06-27 收录

下载链接：

https://datadryad.org/stash/dataset/doi:10.5061/dryad.q573n5tf4

下载链接

链接失效反馈

官方服务：

资源简介：

Reconstructions of phylogenetic relationships in the flowering plant family Rubiaceae, or the coffee family, have up until now relied heavily on single or multi-gene data, primarily from the plastid compartment. With the availability of cost- and time-efficient techniques for generating complete genome sequences, the opportunity arises to resolve some of the relationships that up until now have proven problematic. Here we contribute new data from complete 58 plastid genome sequences representing 55 of the currently 65 recognized tribes of the Rubiaceae. Also contributed are new data from the nuclear rDNA cistrons for the corresponding taxa. Phylogenetic analyses are conducted on two plastid data sets, one including data from the protein coding genes only with a total of 69,828 aligned characters, and a second where protein coding data are combined with an additional 25,666 aligned characters from non-coding regions, and on a nuclear rDNA data set including 6,045 aligned characters. Our results clearly show that simply adopting a “more characters” approach does not resolve the relationships in the Rubiaceae. More importantly, we identify conflicting phylogenetic signals in the data. Analyses of the same plastid data, treated as nucleotides or as codon degenerated data, resolve and support conflicting topologies in the subfamily Cinchonoideae. As these analyses use the same data, we interpret the conflict to result from erroneous assumptions in the models used to reconstruct our phylogenies. Conflicting signals are also identified in the analyses of the plastid vs. the nuclear rDNA data sets. These analyses use data from different genomic compartments, with different inheritance patterns, and we interpret the conflicts as representing “real” conflicts, reflecting biological processes of the past.

迄今为止，被子植物茜草科（Rubiaceae，又称咖啡科）的系统发育关系重建研究，长期高度依赖单基因或多基因数据，且数据主要来源于质体基因组（plastid genome）区域。随着低成本、高效率的全基因组测序技术的普及，我们得以尝试解决此前一直难以厘清的部分系统发育关系。本研究新增了58条完整质体基因组序列数据，涵盖当前已确认的65个茜草科族中的55个；同时还为对应类群提供了核核糖体DNA（nuclear rDNA）顺反子的全新测序数据。本研究针对两组质体数据集以及一组核rDNA数据集开展系统发育分析：第一组仅包含蛋白编码基因数据，共包含69828个比对位点；第二组则将蛋白编码基因数据与额外25666个非编码区比对位点相结合；核rDNA数据集共包含6045个比对位点。研究结果清晰表明，单纯采用"more characters"的策略，无法解决茜草科的系统发育关系解析难题。更为关键的是，本研究在数据中发现了系统发育信号（phylogenetic signal）冲突现象：针对同一质体数据集，分别以核苷酸序列和密码子简并数据（codon degenerated data）进行分析时，得到了相互矛盾的金鸡纳亚科（Cinchonoideae）系统发育拓扑结构（topology）。由于两组分析使用完全相同的原始数据，我们认为该冲突源于系统发育重建模型中存在错误的假设前提。此外，质体数据集与核rDNA数据集的分析结果也存在信号冲突：这两组数据来源于不同的基因组区域，且具有不同的遗传模式，因此我们认为该冲突属于"real"的生物学冲突，反映了物种演化历史中的真实生物学过程。

创建时间：

2023-06-28