five

Pantagruel phylogenomic analysis of a bacterial pangenome covering 571 Rhizobiaceae genomes, with a focus on 41 Pseudorhizobium and Neorhizobium genomes

收藏
DataCite Commons2020-08-27 更新2024-07-27 收录
下载链接:
https://figshare.com/articles/Pantagruel_phylogenomic_analysis_of_a_bacterial_pangenome_covering_571_Rhizobiaceae_genomes_with_a_focus_on_41_Pseudorhizobium_and_Neorhizobium_genomes/8320142
下载链接
链接失效反馈
官方服务:
资源简介:
Comparative gebnomic analysis of 571 Rhizobiaceae genomes was conducted using the <i>Pantagruel</i> pipeline under the default settings as described by Lassalle et al. (2019). This pipeline is designed for the analysis of bacterial pangenomes, including the inference of a species tree, gene trees, and the detection of horizontal gene transfers (HGT) through species tree/gene tree reconciliations (Szöllősi et al. 2015).<br>The whole set of information derived from the <i>Pantagruel</i> processing of the 571 Rhizobiaceae pangenome is stored into a SQLite 3 database. This includes:- metadata on the 571 organisms and their genome assemblies;- annotation of 3,370,771 coding sequences (CDSs);- their classification into 50,792 homologous gene families.<br>In addition, for a subset of 41 genomes of strains of the genera <i>Pseudorhizobium</i> and <i>Neorhizobium</i> and their close relatives, a robust species tree and gene trees were reconstructed for 6,629 gene families represented in these genomes (covering 757,162 CDSs). For these gene families, species tree/gene tree reconciliation was used to infer evolutionary scenarios for each gene lineage, with events of gene duplication, origination, HGT and loss. This allowed the classification of genes into orthologous groups (i.e. a sub-clustering of the homologous 6,629 gene families). The database thus also hold information on:- 3,038,614 gene lineage events;- 19,016 orthologous groups.<br>

本研究采用Pantagruel流水线,按照Lassalle等(2019)报道的默认参数,对571个根瘤菌科(Rhizobiaceae)基因组开展比较基因组学分析。该流水线专为细菌泛基因组分析设计,可实现物种树与基因树的推断,并通过物种树-基因树调和分析检测水平基因转移(HGT)(Szöllősi等,2015)。 通过Pantagruel流水线处理571个根瘤菌科泛基因组所获得的全部信息,均存储于SQLite 3数据库中,具体包括: - 571个菌株及其基因组组装结果的元数据; - 3,370,771条编码序列(CDSs)的注释信息; - 上述编码序列被划分为50,792个同源基因家族。 此外,针对假根瘤菌属(Pseudorhizobium)、新根瘤菌属(Neorhizobium)及其近缘类群的41个基因组子集,针对该子集基因组中包含的6,629个基因家族(涵盖757,162条CDSs)重建了高置信度物种树与基因树。针对这些基因家族,研究人员通过物种树-基因树调和分析推断每个基因谱系的演化场景,包括基因复制、基因起源、HGT以及基因丢失事件,并由此将基因划分为直系同源组(即对6,629个同源基因家族进行亚聚类得到的结果)。该数据库同时存储了以下信息: - 3,038,614个基因谱系事件; - 19,016个直系同源组。
提供机构:
figshare
创建时间:
2019-06-25
二维码
社区交流群
二维码
科研交流群
商业服务