five

The expansion and diversification of pentatricopeptide repeat RNA editing factors in plants

收藏
DataCite Commons2025-06-01 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.vdncjsxqf
下载链接
链接失效反馈
官方服务:
资源简介:
The RNA-binding pentatricopeptide repeat (PPR) family comprises hundreds to thousands of genes in most plants, but only a few dozen in algae, evidence of massive gene expansions during land plant evolution. The nature and timing of these expansions has not been well-defined due to the sparse sequence data available from early-diverging land plant lineages. We exploit the comprehensive OneKP dataset of over 1,000 transcriptomes from diverse plants and algae to establish a clear picture of the evolution of this massive gene family, focusing on the proteins typically associated with RNA editing, which show the most spectacular variation in numbers and domain composition across the plant kingdom. We characterise over 2,250,000 PPR motifs in over 400,000 proteins. In lycophytes, polypod ferns and hornworts, nearly 10% of expressed protein-coding genes encode putative PPR editing factors, whereas they are absent from algae and complex-thalloid liverworts. We show that rather than a single expansion, most land plant lineages with high numbers of editing factors have continued to generate novel sequence diversity. We identify sequence variation that implies functional differences between PPR proteins in seed plants versus non-seed plants and which we propose to be linked to seed-plant-specific editing cofactors. Finally, using the sequence variation across the dataset, we develop a structural model of the catalytic DYW domain associated with C-to-U editing and identify a clade of unique DYW variants that are strong candidates as U-to-C RNA editing factors, given their phylogenetic distribution and sequence characteristics.

RNA结合五肽重复序列(pentatricopeptide repeat, PPR)家族在大多数植物中包含数百至数千个基因,但在藻类中仅含几十个,这为陆生植物演化过程中的大规模基因扩张提供了证据。由于早期分化的陆生植物支系的可用序列数据较为稀疏,这些扩张的性质和时间尚未得到明确界定。我们利用涵盖1000多个来自不同植物和藻类转录组的全面OneKP数据集,清晰描绘了这一庞大基因家族的演化图景,重点关注通常与RNA编辑(RNA editing)相关的蛋白质——这类蛋白质在整个植物界的数量和结构域组成上表现出最为显著的变异。我们在超过40万个蛋白质中鉴定了225万多个PPR基序。在石松类植物、水龙骨类蕨类和角苔类中,近10%的表达蛋白编码基因编码假定的PPR编辑因子,而藻类和复杂叶状体苔类中则不存在这类因子。我们发现,拥有大量编辑因子的大多数陆生植物支系并非经历单次扩张,而是持续产生新的序列多样性。我们鉴定出的序列变异表明,种子植物与非种子植物中的PPR蛋白存在功能差异,且我们认为这种差异与种子植物特有的编辑辅因子相关。最后,利用数据集中的序列变异,我们构建了与C-to-U编辑相关的催化DYW结构域(DYW domain)的结构模型,并鉴定出一个独特的DYW变异分支——鉴于其系统发育分布和序列特征,该分支是U-to-C RNA编辑因子的强有力候选者。
提供机构:
Dryad
创建时间:
2019-12-11
二维码
社区交流群
二维码
科研交流群
商业服务