Supplemental Data for Kogay and Zhaxybayeva (2022)
收藏DataCite Commons2022-09-14 更新2024-07-29 收录
下载链接:
https://figshare.com/articles/dataset/Supplemental_Data_for_Kogay_and_Zhaxybayeva_2022_/20082749
下载链接
链接失效反馈官方服务:
资源简介:
<strong>GenBank accession numbers:</strong> <strong>208_genomes_accessions.xlsx: </strong>List of selected 208 alphaproteobacterial genomes with GTA ‘head-tail’ clusters. <strong>g7_replacement_Sphingomonadales.pdf:</strong> GenBank accession numbers of the putative g7 protein found in 11 <em>Sphingomonadales</em> genomes. <br> <strong>Gene families in 208 genomes:</strong> <strong>orthogroups.tsv.zip:</strong> Gene families in 208 alphaproteobacterial genomes; the families were constructed using only genes that are at least 300 nucleotides in length. Each line in the file represents one gene family (an orthogroup). In each line, the individual gene family members are identified by RefSeqID of a genome joined by an underscore with RefSeqID of protein sequence of the gene. <br> <strong>GTA gene predictions:</strong> <strong>gta_regions.xlsx: </strong>Predicted GTA ‘head-tail’ clusters in the initial dataset of 212 genomes. The data in the columns for individual GTA genes show their RefSeq accession numbers; empty cells indicate that a gene was not detected in a genome. The 208 genomes that were retained for the selection analyses are highlighted in green. <br> <strong>Effective Number of Codons (ENC) calculations:</strong> <strong>codonW_enc_gc3s.zip:</strong> Effective number of codons (ENC) and GC3s values for genes in 208 alphaproteobacterial genomes that are at least 300 nucleotides in length. Each genome is represented by one file. The individual genes are identified by RefSeqID of a protein. <strong>enc_deviation_gta_genes.xlsx:</strong> Deviation (in %) of Effective Number of Codons (ENC) values of the reference GTA genes in 208 genomes from the null model of no codon bias. Empty cells reflect either absence of a GTA gene from a genome or if its observed ENC been higher than expected (therefore, unreliable due to sampling of codons in a finite gene sequence length). <strong>rel_enc.xlsx:</strong> Deviation of the ENC values of the reference GTA genes in 208 genomes normalized by the average ENC deviation of all genes in a genome. <br> <strong>tRNA Adaptation Index (tAI) calculations:</strong> <strong>stAIcalc_wi.zip: </strong>Codon adaptation indices (wi; i=1-64) estimated by stAIcalc for genes in 208 alphaproteobacterial genomes that are at least 300 nucleotides in length. Each genome is represented by one file. For each genome, codons from all annotated genes were combined to calculate wi values for each codon. <strong>stAIcalc_tAI.zip: </strong>tRNA adaptation (tAI) values for genes in 208 alphaproteobacterial genomes that are at least 300 nucleotides in length. Each genome is represented by one file. For each gene in a genome, calculated tAI value is listed. The individual genes are identified by RefSeqID of a genome joined by an underscore with RefSeqID of protein sequence of the gene. <strong>ptAI_gta_genes.xlsx:</strong> Percentile tAI (ptAI) values of GTA genes of at least 300 nucleotides and with a broad taxonomic representation in 208 genomes. Empty cells reflect absence of a GTA gene in a genome. <br> <strong>Phylogenetic Generalized Least Squares</strong> (<strong>PGLS) analysis:</strong> <strong>orthogroups_PGLS.xlsx:</strong> PGLS model fit (slope and p-value) between individual reference GTA genes and other gene families across 208 genomes. Fourteen gene families (listed in Table 1) that have a significant model fit across all reference GTA genes are highlighted in yellow. <br> <strong>Phylogenetic Analyses:</strong> <strong>reference_aln_tree.zip: </strong>Concatenated alignment of 29 phylogenetic marker genes found in 208 alphaproteobacterial genomes (in FASTA format; reference_alignment.fasta) and reference phylogenomic tree reconstructed from the alignment (in Newick format; reference_tree.nwk). <strong>tonB_aln_tree.zip: </strong>Alignment of the <em>tonB</em> gene homologs (gene family OG0002642) detected in alphaproteobacterial genomes (in FASTA format; tonB_alignment.fasta) and their phylogenetic relationships (in Newick format; tonB_tree.nwk). <strong>tonB_phylogeny.pdf: </strong>The evolutionary history of the <em>tonB</em> gene family. <strong>gafA_aln_tree.zip: </strong>Alignment of the <em>gafA</em> gene homologs (OG0001218) detected in alphaproteobacterial genomes (in FASTA format; gafA_alignment.fasta) and their phylogenetic relationships (in Newick format; gafA_tree.nwk). <strong>gafA_tree_comparisons.pdf:</strong> Phylogenies of gafA, concatenated reference GTA genes and concatenated reference phylogenomic markers of GTA-containing genomes. <strong>ref_gta_aln_tree.zip: </strong>Concatenated alignment of the reference GTA genes in 208 alphaproteobacterial genomes (in FASTA format; ref_gta_alignment.fasta) and phylogenetic tree reconstructed from the alignment (in Newick format; ref_gta_tree.nwk). <br> <strong>Code:</strong> <strong>exp_enc_deviation.py: </strong>Python script that calculates the expected effective number of codons (ENC) based on the GC3s content and the deviation from the expectations under the null model of no codon bias.
提供机构:
figshare
创建时间:
2022-06-16



