five

Supplementary Data for Farrell, Nesbø and Zhaxybayeva (2024)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Supplementary_Data_for_Farrell_Nesb_and_Zhaxybayeva_2024_/26880394
下载链接
链接失效反馈
官方服务:
资源简介:
60toga_16SrRNA_tree.zip: Alignment (in FASTA format) and phylogenetic tree (in Newick format) for the 60 Thermotogota species dataset 60toga_homogeneous_16SrRNA_thermometer.zip: data for 16S rRNA analyses of 60 Thermotogota species dataset under homogeneous model - 16SrRNA_cmalign_withAllASR.output: the Infernal alignment of extant and reconstructed ancestral sequences. - allNodes_with_ASRgc_and_estimatedTemps.txt: esimated estimated stem GC content (second column) and estimated OGT (third column) for all internal nodes (first column) - tree_with_bootstrap_support_nodenames.pdf: phylogenetic trees showing bootstrap support values and internal node labels 60toga_nonhomogeneous_16SrRNA_thermometer.zip: data for 16S rRNA analyses of 60 Thermotogota species dataset under non-homogeneous model - baseml_nonhomogeneous_16SrRNA_GC_Thermometer.ctl: control file used for baseml analyses - reconstructed_seqs_infernalcm.aln: the Infernal alignment file of extant and ancestrally reconstructed sequences - reconstructed_seqs.fna: reconstructed nucleotide sequences at the ancestral nodes (in FASTA format) - baseml_output.txt: raw output from the baseml analyses - 2_parse_stem_regions_from_infernal_aln.py and 3_compute_ancestor_seq_gc.py: Python scripts for parsing the Infernal stem alignments and computing the ancestral sequence GC content 61toga_16SrRNA_tree.zip: Alignment (in FASTA format) and phylogenetic tree (in Newick format) for the 61 Thermotogota strains dataset. 61toga_homogeneous_16SrRNA_thermometer.zip: data for 16S rRNA analyses of 61 Thermotogota strains dataset (matched to 64 Thermotogota genomes dataset) under homogeneous model - matched_ancseqs_infernalcmalign.output: the Infernal alignment of extant & reconstructred ancestral sequences - Tree_with_bootstrap_support_nodenames.pdf: phylogenetic trees showing bootstrap support values and internal node labels 61toga_nonhomogeneous_16SrRNA_thermometer.zip: data for 16S rRNA analyses of 61 Thermotogota strains dataset (matched to 64 Thermotogota genomes dataset) under non-homogeneous model - baseml_nonhomogeneous_matched16SrRNA_GC_Thermometer.ctl: control file used for baseml analyses - reconstructed_seqs_infernalcm.aln: the Infernal alignment file of extant and ancestrally reconstructed sequences - reconstructed_seqs.fna: reconstructed nucleotide sequences at the ancestral nodes (in FASTA format) - baseml_output.txt: raw output from the baseml analyses - 2_parse_stem_regions_from_infernal_aln.py and 3_compute_ancestor_seq_gc.py: Python scripts for parsing the Infernal stem alignments and computing the ancestral sequence GC content all_fams_amino_acid_seq.zip: amino acid sequences for each of the 13,121 gene families (HOGs) with all taxa, including described isolates, other strains, MAGs and the outgroup (in FASTA format). ancestral_aminoacid_reconstruction.zip: data for reconstruction of ancestral amino acid sequences and analyses of their IVYWRELGKP content for 294 gene families - 294_aln subfolder: alignments of amino acid sequences of the gene families (in FASTA format) - 294_trees subfolder: rooted ML trees of the gene families (in Newick format) - 294seqs_LCTA_reconstructed_proteome.fasta - last common ancestor sequences for each gene family inferred using the amino acid with the highest probability for each site (in FASTA format) - 294seqs_LCTA_reconstructed_proteome_siteState_data.pkl - a pickle dataframe containing the site and state probabilities for the LCTA node for each gene family - list_of_families_for_lcta_reconstruction.txt - a list of the HOG# for the 294 gene families - ParseIQTreeASRStates.py - script to read in trees from the IQTree ASR process, root them and identify the LCTA node - Weighted_IVYWREL_IVYWRELGKP_CvP_IQtreeASR.py - script that computes the IVYWRELGKP value and estimates OGT for each gene family RibosomalProteins.zip: data for analyses of ribosomal proteins in 64 Thermotogota genomes - concat_rprot_ALLTAXA_withOG.aln: alignment of concatenated ribosomal proteins from 159 Thermotogota genomes and the outgroup (in FASTA format) - concat_rprot_ALLTAXA_withOG_names.tree: phylogenetic tree of concatenated ribosomal proteins from 159 Thermotogota genomes and the outgroup (in Newick format) - concat_rprot_knownTempsTaxa.aln: alignment of concatenated ribosomal proteins from 64 Thermotogota genomes and the outgroup (in FASTA format) - concat_rprot_knownTempsTaxa_names.tree: phylogenetic tree of concatenated ribosomal proteins from 64 Thermotogota genomes and the outgroup (the reference tree) (in Newick format) - OrthoFinder_SpeciesTree_rooted_with_node_labels.tree: the OrthoFinder version of 159 Thermotogota phylogeny with nodes labeled (N#). COUNT_input_matrix.zip: matrix of gene counts per gene family used as the input to the COUNT gain/loss model Pyseer_input_matrix.zip: presence/absence data for all gene families used as the input to the Pyseer analyses List_597_OGT_associated_families.txt: A text file containing gene family IDs (HOG #) of the 597 gene families found to be significantly associated with OGT. 68_fams_alignments.zip: Alignments for the 68 gene families correlated with OGT. Alignments include only Thermotogota with assigned temperature and the outgroup taxa (in FASTA format). 68_fams_expanded_alignments.zip: Alignments of the 68 expanded gene families (in FASTA format). 68_fams_expanded_trees.zip: Phylogenetic trees of the 68 expanded gene families (in Newick format). 68_fams_expanded_trees_pdf.zip: Phylogenetic trees of the 68 expanded gene families (in PDF format). The trees should be considered unrooted trees. The taxa names are colored in green for in-gene-family Thermotogota, blue for outside-of-gene-family Thermotogota, and black for non-Thermotogota taxa.
创建时间:
2024-09-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作