Supplementary Data for Farrell, Nesbø and Zhaxybayeva (2024)
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Supplementary_Data_for_Farrell_Nesb_and_Zhaxybayeva_2024_/26880394
下载链接
链接失效反馈官方服务:
资源简介:
60toga_16SrRNA_tree.zip: Alignment (in FASTA format) and phylogenetic tree (in Newick format) for the 60 Thermotogota species dataset
60toga_homogeneous_16SrRNA_thermometer.zip: data for 16S rRNA analyses of 60 Thermotogota species dataset under homogeneous model
- 16SrRNA_cmalign_withAllASR.output: the Infernal alignment of extant and reconstructed ancestral sequences.
- allNodes_with_ASRgc_and_estimatedTemps.txt: esimated estimated stem GC content (second column) and estimated OGT (third column) for all internal nodes (first column)
- tree_with_bootstrap_support_nodenames.pdf: phylogenetic trees showing bootstrap support values and internal node labels
60toga_nonhomogeneous_16SrRNA_thermometer.zip: data for 16S rRNA analyses of 60 Thermotogota species dataset under non-homogeneous model
- baseml_nonhomogeneous_16SrRNA_GC_Thermometer.ctl: control file used for baseml analyses
- reconstructed_seqs_infernalcm.aln: the Infernal alignment file of extant and ancestrally reconstructed sequences
- reconstructed_seqs.fna: reconstructed nucleotide sequences at the ancestral nodes (in FASTA format)
- baseml_output.txt: raw output from the baseml analyses
- 2_parse_stem_regions_from_infernal_aln.py and 3_compute_ancestor_seq_gc.py: Python scripts for parsing the Infernal stem alignments and computing the ancestral sequence GC content
61toga_16SrRNA_tree.zip: Alignment (in FASTA format) and phylogenetic tree (in Newick format) for the 61 Thermotogota strains dataset.
61toga_homogeneous_16SrRNA_thermometer.zip: data for 16S rRNA analyses of 61 Thermotogota strains dataset (matched to 64 Thermotogota genomes dataset) under homogeneous model
- matched_ancseqs_infernalcmalign.output: the Infernal alignment of extant & reconstructred ancestral sequences
- Tree_with_bootstrap_support_nodenames.pdf: phylogenetic trees showing bootstrap support values and internal node labels
61toga_nonhomogeneous_16SrRNA_thermometer.zip: data for 16S rRNA analyses of 61 Thermotogota strains dataset (matched to 64 Thermotogota genomes dataset) under non-homogeneous model
- baseml_nonhomogeneous_matched16SrRNA_GC_Thermometer.ctl: control file used for baseml analyses
- reconstructed_seqs_infernalcm.aln: the Infernal alignment file of extant and ancestrally reconstructed sequences
- reconstructed_seqs.fna: reconstructed nucleotide sequences at the ancestral nodes (in FASTA format)
- baseml_output.txt: raw output from the baseml analyses
- 2_parse_stem_regions_from_infernal_aln.py and 3_compute_ancestor_seq_gc.py: Python scripts for parsing the Infernal stem alignments and computing the ancestral sequence GC content
all_fams_amino_acid_seq.zip: amino acid sequences for each of the 13,121 gene families (HOGs) with all taxa, including described isolates, other strains, MAGs and the outgroup (in FASTA format).
ancestral_aminoacid_reconstruction.zip: data for reconstruction of ancestral amino acid sequences and analyses of their IVYWRELGKP content for 294 gene families
- 294_aln subfolder: alignments of amino acid sequences of the gene families (in FASTA format)
- 294_trees subfolder: rooted ML trees of the gene families (in Newick format)
- 294seqs_LCTA_reconstructed_proteome.fasta - last common ancestor sequences for each gene family inferred using the amino acid with the highest probability for each site (in FASTA format)
- 294seqs_LCTA_reconstructed_proteome_siteState_data.pkl - a pickle dataframe containing the site and state probabilities for the LCTA node for each gene family
- list_of_families_for_lcta_reconstruction.txt - a list of the HOG# for the 294 gene families
- ParseIQTreeASRStates.py - script to read in trees from the IQTree ASR process, root them and identify the LCTA node
- Weighted_IVYWREL_IVYWRELGKP_CvP_IQtreeASR.py - script that computes the IVYWRELGKP value and estimates OGT for each gene family
RibosomalProteins.zip: data for analyses of ribosomal proteins in 64 Thermotogota genomes
- concat_rprot_ALLTAXA_withOG.aln: alignment of concatenated ribosomal proteins from 159 Thermotogota genomes and the outgroup (in FASTA format)
- concat_rprot_ALLTAXA_withOG_names.tree: phylogenetic tree of concatenated ribosomal proteins from 159 Thermotogota genomes and the outgroup (in Newick format)
- concat_rprot_knownTempsTaxa.aln: alignment of concatenated ribosomal proteins from 64 Thermotogota genomes and the outgroup (in FASTA format)
- concat_rprot_knownTempsTaxa_names.tree: phylogenetic tree of concatenated ribosomal proteins from 64 Thermotogota genomes and the outgroup (the reference tree) (in Newick format)
- OrthoFinder_SpeciesTree_rooted_with_node_labels.tree: the OrthoFinder version of 159 Thermotogota phylogeny with nodes labeled (N#).
COUNT_input_matrix.zip: matrix of gene counts per gene family used as the input to the COUNT gain/loss model
Pyseer_input_matrix.zip: presence/absence data for all gene families used as the input to the Pyseer analyses
List_597_OGT_associated_families.txt: A text file containing gene family IDs (HOG #) of the 597 gene families found to be significantly associated with OGT.
68_fams_alignments.zip: Alignments for the 68 gene families correlated with OGT. Alignments include only Thermotogota with assigned temperature and the outgroup taxa (in FASTA format).
68_fams_expanded_alignments.zip: Alignments of the 68 expanded gene families (in FASTA format).
68_fams_expanded_trees.zip: Phylogenetic trees of the 68 expanded gene families (in Newick format).
68_fams_expanded_trees_pdf.zip: Phylogenetic trees of the 68 expanded gene families (in PDF format). The trees should be considered unrooted trees. The taxa names are colored in green for in-gene-family Thermotogota, blue for outside-of-gene-family Thermotogota, and black for non-Thermotogota taxa.
创建时间:
2024-09-13



