Supplementary Data for Farrell, Nesbø and Zhaxybayeva (2024)
收藏DataCite Commons2025-06-01 更新2024-11-06 收录
下载链接:
https://figshare.com/articles/dataset/Supplementary_Data_for_Farrell_Nesb_and_Zhaxybayeva_2024_/26880394/1
下载链接
链接失效反馈官方服务:
资源简介:
<b><i>60toga_16SrRNA_tree.zip</i></b><b><i>:</i></b><b> </b>Alignment (in FASTA format) and phylogenetic tree (in Newick format) for the 60 <i>Thermotogota</i> species dataset<b><i>60toga_homogeneous_16SrRNA_thermometer.zip</i></b><b>: </b>data for 16S rRNA analyses of 60 <i>Thermotogota</i> species dataset under homogeneous model<b>- 16SrRNA_cmalign_withAllASR.output:</b> the Infernal alignment of extant and reconstructed ancestral sequences.<br><b>- allNodes_with_ASRgc_and_estimatedTemps.txt:</b> esimated estimated stem GC content (second column) and estimated OGT (third column) for all internal nodes (first column)<b>- tree_with_bootstrap_support_nodenames.pdf:</b> phylogenetic trees showing bootstrap support values and internal node labels<b><i>60toga_nonhomogeneous_16SrRNA_thermometer</i></b><b>.zip</b><b><i>: </i></b>data for 16S rRNA analyses of 60 <i>Thermotogota</i> species dataset under non-homogeneous model<b>- baseml_nonhomogeneous_16SrRNA_GC_Thermometer.ctl</b>: control file used for baseml analyses<br><b>- reconstructed_seqs_infernalcm.aln:</b> the Infernal alignment file of extant and ancestrally reconstructed sequences<br><b>- reconstructed_seqs.fna: </b>reconstructed nucleotide sequences at the ancestral nodes (in FASTA format)<b>- baseml_output.txt:</b> raw output from the baseml analyses<br><b>- 2_parse_stem_regions_from_infernal_aln.py and </b><b>3_compute_ancestor_seq_gc.py: </b>Python scripts for parsing the Infernal stem alignments and computing the ancestral sequence GC content<br><b><i>61toga_16SrRNA_tree.zip</i></b><b>: </b>Alignment (in FASTA format) and phylogenetic tree (in Newick format) for the 61 <i>Thermotogota</i> strains dataset.<b><i>61toga_</i></b><b><i>homogeneous_16SrRNA_thermometer.zip</i></b><b><i>: </i></b>data for 16S rRNA analyses of 61 <i>Thermotogota </i>strains dataset (matched to 64 <i>Thermotogota</i> genomes dataset) under homogeneous model<br><b>- matched_ancseqs_infernalcmalign.output:</b> the Infernal alignment of extant & reconstructred ancestral sequences<br><b>- Tree_with_bootstrap_support_nodenames.pdf: </b>phylogenetic trees showing bootstrap support values and internal node labels<br><b><i>61toga_nonhomogeneous_16SrRNA_thermometer</i></b><b>.zip</b>: data for 16S rRNA analyses of 61 <i>Thermotogota </i>strains dataset (matched to 64 <i>Thermotogota</i> genomes dataset) under non-homogeneous model<br><b>- baseml_nonhomogeneous_matched16SrRNA_GC_Thermometer.ctl:</b> control file used for baseml analyses<b>- reconstructed_seqs_infernalcm.aln:</b> the Infernal alignment file of extant and ancestrally reconstructed sequences<br><b>- reconstructed_seqs.fna: </b>reconstructed nucleotide sequences at the ancestral nodes (in FASTA format)<b>- baseml_output.txt:</b> raw output from the baseml analyses<br><b>- 2_parse_stem_regions_from_infernal_aln.py and </b><b>3_compute_ancestor_seq_gc.py: </b>Python scripts for parsing the Infernal stem alignments and computing the ancestral sequence GC content<br><b><i>all_fams_amino_acid_seq.zip:</i></b><b> </b>amino acid sequences for each of the 13,121 gene families (HOGs) with all taxa, including described isolates, other strains, MAGs and the outgroup (in FASTA format).<b><i>ancestral_aminoacid_reconstruction.zip</i></b><b>:</b> data for reconstruction of ancestral amino acid sequences and analyses of their IVYWRELGKP content for 294 gene families<br><b>- 294_aln</b> subfolder: alignments of amino acid sequences of the gene families (in FASTA format)<b>- 294_trees </b>subfolder: rooted ML trees of the gene families (in Newick format)<br><b>- </b><b>294seqs_LCTA_reconstructed_proteome.fasta</b> - last common ancestor sequences for each gene family inferred using the amino acid with the highest probability for each site (in FASTA format)<b>- 294seqs_LCTA_reconstructed_proteome_siteState_data.pkl</b> - a pickle dataframe containing the site and state probabilities for the LCTA node for each gene family<br><b>- list_of_families_for_lcta_reconstruction.txt</b> - a list of the HOG# for the 294 gene families <br><b>- ParseIQTreeASRStates.py</b> - script to read in trees from the IQTree ASR process, root them and identify the LCTA node<br><b>- Weighted_IVYWREL_IVYWRELGKP_CvP_IQtreeASR.py</b> - script that computes the IVYWRELGKP value and estimates OGT for each gene family<br><b><i>RibosomalProteins.zip</i></b><b><i>: </i></b>data for analyses of ribosomal proteins in 64 <i>Thermotogota</i> genomes<b>- concat_rprot_ALLTAXA_withOG.aln: </b>alignment of concatenated ribosomal proteins from 159 <i>Thermotogota</i> genomes and the outgroup (in FASTA format)<br><b>- concat_rprot_ALLTAXA_withOG_names.tree: </b>phylogenetic tree<b> </b>of concatenated ribosomal proteins from 159 <i>Thermotogota</i> genomes and the outgroup (in Newick format)<b>- concat_rprot_knownTempsTaxa.aln: </b>alignment of concatenated ribosomal proteins from 64 <i>Thermotogota</i> genomes and the outgroup (in FASTA format)<br><b>- concat_rprot_knownTempsTaxa_names.tree:</b> phylogenetic tree<b> </b>of concatenated ribosomal proteins from 64 <i>Thermotogota</i> genomes and the outgroup (the reference tree) (in Newick format)<b>- OrthoFinder_SpeciesTree_rooted_with_node_labels.tree: </b>the OrthoFinder version of 159 <i>Thermotogota</i> phylogeny with nodes labeled (N#).<b><i>COUNT_input_matrix.zip:</i></b> matrix of gene counts per gene family used as the input to the COUNT gain/loss model<b><i>Pyseer_input_matrix.zip:</i></b><b> </b>presence/absence data for all gene families used as the input to the Pyseer analyses<b><i>List_597_OGT_associated_families.txt</i></b><i>:</i> A text file containing gene family IDs (HOG #) of the 597 gene families found to be significantly associated with OGT.<b><i>68_fams_alignments.zip: </i></b>Alignments for the 68 gene families correlated with OGT. Alignments include only <i>Thermotogota</i> with assigned temperature and the outgroup taxa (in FASTA format).<b><i>68_fams_expanded_alignments.zip:</i></b> Alignments of the 68 expanded gene families (in FASTA format).<b><i>68_fams_expanded_trees.zip:</i></b> Phylogenetic trees of the 68 expanded gene families (in Newick format).<b><i>68_fams_expanded_trees_pdf.zip</i></b><i>:</i> Phylogenetic trees of the 68 expanded gene families (in PDF format). The trees should be considered unrooted trees. The taxa names are colored in green for in-gene-family <i>Thermotogota</i>, blue for outside-of-gene-family <i>Thermotogota</i>, and black for non-<i>Thermotogota</i> taxa.<br><br>
提供机构:
figshare
创建时间:
2024-09-13



