five

Evolution scenarios for gene families of the RhcI-T3SS and nod operon and co-evolution scores

收藏
DataCite Commons2021-04-09 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/Evolution_scenarios_for_gene_families_of_the_RhcI-T3SS_and_nod_operon_and_co-evolution_scores/12191103
下载链接
链接失效反馈
官方服务:
资源简介:
We conducted gene tree/species tree reconciliation analyses to infer the scenarios of evolution of the following homologous gene families under a gene duplication, transfer and loss (DTL) model of evolution: <i>nodA</i>, <i>nodB</i>, <i>nodC</i>, <i>rhcC2</i>, <i>rhcD</i>, <i>rhcN</i>, <i>rhcV</i>, <i>nopC</i>, <i>nopM</i>, <i>nopL</i> and <i>nopT.</i> We used the software GeneRax for joint estimation of the gene phylogeny and the gene evolution scenario (Morel et al. 2019) v1.2.0, with the following parameters: -r UndatedDTL --max-spr-radius 5 --strategy SPR --per-family-rates and either the option --reconcile to obtain the ML reconciliation, or the option --reconciliation-samples 1000 to obtain a sample of 1,000 sub-optimal reconciliations. As input to GeneRax, we provide the core-genome-alignment-based species tree and protein alignments obtained with ClustalOmega v1.2.4 with default parameters (Sievers et al. 2011). We then compared these scenarios to compute a co-evolution score using the script gene_family_co-evolution.r from the bioinformatic pipeline Pantagruel (Lassalle et al. 2019). The score was defined for two gene families <i>A</i> and <i>B</i>, as the sum over the species tree of the square root of the joint probabilities of events observed for both families, scaled by the size of the largest gene tree. The significance of this score was evaluated by comparing it to a null distribution of scores generated by sampling from the <i>Bradyrhizobium</i> pangenome as obtained using the Pantagruel pipeline. Specifically, gene families were selected within the <i>Bradyrhizobium</i> pangenome to have a prevalence amongst the 155 studied genomes similar to the distribution of tested gene families, based on their histogram of gene copy numbers: pangenome gene families which count distribution was similar to at least one tested gene family (Chi-squared test, p &gt; 0.75) were selected as part of the control set. Because we aim at providing a distribution of scores under the null hypothesis that the gene families did not co-evolve, we excluded the pangenome gene families that we expected to co-evolve with our test gene families, typically those genes involved in the type 3 secretion system (T3SS) biosynthesis, those coding for T3SS effector proteins (T3Es), or those related to legume-rhizobium symbiosis; we thus filtered out gene families with annotation matching (case-insensitevely) any of those keywords: ‘Nod’, ‘Nop’, ‘Rhc’, ‘Sct’, ‘Hrc’, ‘Hrp’, ‘E3 ubiquitin--protein ligase’, ‘YopT’. Given the distribution test family copy numbers indicated in the Annex Table 1, and the keyword filtering, 184 control gene families were selected out of 30,078 pangenome gene families. Generax scenarios and co-evolution scores were then computed amongst this 184 control family set in the same way as for the test families. We then compared the scores obtained for each pair of test gene families to the distribution of scores obtained for either test family against all control families; the rank of the test score among the control distribution (scaled by the number of control scores) provides the rank p-value.<br>Contents:<br>- clustalo_prot_ali_nostop.tar.gz: ClustalOmega protein alignments in fasta format;<br>- core-genome-based_reference_tree_Brady2019.full: reference species tree in Newick format- ML_reconciliations_rhc-nod-nop.tar.gz: output of GeneRax ML reconciliations;- ML_reconciliations_rhc-nod-nop_graphics.tar.gz: PDF files with plots of ML scenarios, one per gene family;- sample_reconciliations_rhc-nod-nop.tar.gz: output of GeneRax bayesian sampling of reconciliations;<br>- albin_similar_phyloprofiles.r: R script for selection of control families;- full_families_genome_counts-noORFans.mat.gz: gene family count matrix of the whole bradyrhizobial pangenome;- pangenome_phyletic_profiles.tar.gz: presence/absence profiles of test and pangenome gene families and selected control families;<br>- coevolution.zip: results of co-evolution analysis, including plots and table files of co-evolution scores and rank p-values for test and test+control gene family sets, as well as list of high-probability co-events, for all co-event types(DTSL) or co-transfers only (T).<br>NB: the full dataset for reconstruction of the species tree is available at https://doi.org/10.6084/m9.figshare.14388440.v1
提供机构:
figshare
创建时间:
2020-04-26
二维码
社区交流群
二维码
科研交流群
商业服务