Species trees of 571 Rhizobiaceae genomes, with a focus on 41 Pseudorhizobium and Neorhizobium, based on core genome gene concatenates
收藏DataCite Commons2025-04-01 更新2024-08-17 收录
下载链接:
https://figshare.com/articles/Species_trees_of_571_Rhizobiaceae_genomes_with_a_focus_on_41_Pseudorhizobium_and_Neorhizobium_based_on_core_genome_gene_concatenates/8316827/2
下载链接
链接失效反馈官方服务:
资源简介:
Genomic dataset and gene family classificationWe assembled a complete bacterial genome dataset covering all known representative of the subgroup in the alphaproteobacterial families Rhizobiaceae and (sister group) Aurantimonadaceae. This dataset comprises all 564 genomes available from the NCBI RefSeq Assembly database on the 23 Apr 2018, filtering anomalous genomes and those with a contig N50 < 98kb using query: ‘(txid82115[Organism:exp] OR txid255475[Organism:exp]) AND ("latest refseq"[filter] NOT anomalous[filter]) AND ("98000"[ContigN50] : "20000000"[ContigN50])’. From this dataset we removed assembly GCF_002000045.1 (R. flavum YW14) for which we had a newer, higher quality assembly. To these we added the new genome sequences from the eight strains mentioned above, for a total of 571 genomes (dataset ‘571Rhizob’).<br>Reference species treesFrom the 571Rhizob genome dataset, we define the pseudo-core genome as genes occurring only in a single copy and present in at least 561 out of the 571 genomes (98%). The resulting pseudo-core genome gene set (thereafter referred as pCG<sub>571</sub>) includes 155 loci, which protein alignments were concatenated. This concatenated protein alignment was used to compute a reference species tree (S<sub>ML571</sub>) with RAxML (Stamatakis 2014) under the model PROTCATLGX; branch supports were estimated by generating 200 rapid bootstraps under the same parameters. From the S<sub>ML571</sub> tree, we identified the well-supported clade grouping 41 genomes including all representative of Neorhizobium spp. and Pseudorhizobium spp. and our new isolates (dataset ‘41NeoPseudo’). To gain further phylogenetic resolution in this clade of interest, we restricted the pCG<sub>571</sub> concatenated alignment to the 41 genomes of this smaller genomic dataset, which we used as input to the Phylobayes program for a more accurate (but computationally more expensive) Bayesian phylogenetic inference under the CAT-GTR+G4 model (Lartillot et al. 2007). This provided us with a robust non-ultrametric tree for the 41 genomes (S<sub>BA41</sub>). We finally used this S<sub>BA41</sub> tree as a fixed input topology for Phylobayes to infer an ultrametric tree (unitless ‘time’ tree) under the CIR clock model (Lepage et al. 2007), further referred to as T<sub>BA41</sub>.
提供机构:
figshare
创建时间:
2019-06-25



