Supplementary Data for Kogay et al. (2019)
收藏DataCite Commons2020-08-27 更新2024-07-27 收录
下载链接:
https://figshare.com/articles/Supplementary_Data_for_Kogay_et_al_2019_/8796419/1
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains supplementary figures and tables, sequence alignments, phylogenetic trees and outlier removal calculations used in the bioinformatic analyses presented in:<br> <br> Roman Kogay, Taylor B. Neely, Daniel P. Birnbaum, Camille R. Hankel, Migun Shakya, and Olga Zhaxybayeva. “Machine-learning classification suggests that many alphaproteobacterial prophages may instead be gene transfer agents”, BioRxiv, 2019. (BIORXIV/2019/697243; available at https://www.biorxiv.org/content/10.1101/697243v1)<br><br> <b>File Contents:</b><b><br></b> <b>Supplementary_Figures.pdf: </b>Supplementary Figures S1 and S2 in the manuscript.<br><br> <b>Supplementary_Tables.zip:</b> Supplementary Tables S1-S15 in the manuscript. <b><br> Alignments_for_weight_assignment_GTAs.zip:</b> Alignments of 'true GTA' sequences in the training dataset. These alignments were used to generate pairwise phylogenetic distances for the weighting scheme. The alignments are in FASTA format. The filename prefix (g2, …, g15) refers to the RcGTA gene name (see <b>Supplementary Table S1</b>). <b><br></b><b>Alignments_for_weight_assignment_viruses.zip:</b> Alignments of 'true virus' sequences in the training dataset. These alignments were used to generate pairwise phylogenetic distances for the weighting scheme. The alignments are in FASTA format. The filename prefix (g2, …, g15) refers to the RcGTA gene name (see <b>Supplementary Table S1</b>). <b><br></b><b>Alignments_for_removal_of_outliers.zip:</b> Alignments of ‘true GTA’ and ‘true virus’ sequences in the training datasets. Pairwise phylogenetic distances calculated from these alignments were used to remove GTA homologs that are more closely related to viruses than to other GTAs, as well as to investigate obtained lower accuracies for g6 and g12 (<b>Supplementary Table S11)</b>. The alignments are in FASTA format. The filename prefix (g2, …, g15) refers to the RcGTA gene name (see <b>Supplementary Table S1</b>). <b><br></b><b>Outlier_removal.xlsx</b>: Calculations to identify GTAs that are more closely related to viruses than to other GTAs. The removed sequences are highlighted. <b><br></b><b>Reference_phylogenetic_tree_reconstruction.zip:</b> Concatenated alignment of 83 marker genes in 1,423 taxa in PHYLIP format (concatenated_83markers.phy); information about partitions and substitution models used in phylogenetic reconstruction (partitions.txt); and phylogenetic tree in Newick format (1423_alphaproteobacteria_reference_tree.newick).
提供机构:
figshare
创建时间:
2019-07-15



