Supplementary Data for Emma Esterman's Honors Thesis, Dartmouth College, 2020.
收藏DataCite Commons2020-08-25 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/Supplementary_Data_for_Emma_Esterman_s_Honors_Thesis_Dartmouth_College_2020_/12354752
下载链接
链接失效反馈官方服务:
资源简介:
<b>Supplementary_Tables.xlsx: </b>Includes Supplementary Tables S1, S2, S3 and S4. <br><b>Supplementary Table S1:</b> List of the 20 profiles used as queries for PSI-BLAST searches to find TerL homologs.<br><br><b>Supplementary Table S2:</b> List of 73 viral TerLs of experimentally determined packaging mechanism from Merrill et al. (2016) and Casjens and Gilcrease (2009) that were identified in our dataset of 252,614 TerL homologs. The virus taxID, TerL accession number and accession number of the representative on the tree in Figure 5.3 are provided.<br><br><b>Supplementary Table S3:</b> List of the 28 profiles used as queries for PSI-BLAST searches to find portal homologs.<br><br><b>Supplementary Table S4:</b> List of viral portals of experimentally determined packaging mechanism from Merrill et al. (2016) and Casjens and Gilcrease (2009) as well as the portals of characterized structure from Protein Data Bank that were identified in our dataset of 288,836 portal homologs. The virus taxID, portal accession number, accession number of the representative on the tree in Figure 7.2 and PDB ID of portal structure (if determined) are provided.<br><b>254382_acc.zip: </b>GenBank accession numbers of the 254,382 RcGTA TerL protein homologs that are taxonomically assigned to bacteria, archaea, or viruses and likely include both an ATPase (N-terminal) and nuclease (C-terminal) domains.<br><br><b>252614_acc.zip:</b> GenBank accession numbers of the 252,614 RcGTA TerL protein homologs that are represented by 11,051 TerLs on the phylogenetic tree in Figure 5.3.<br><br><b>11051_tree_acc.zip: </b>GenBank accession numbers of 11,051 amino acid sequences used for reconstruction of the tree shown in Figure 5.3.<br><br><b>341011_acc.zip: </b>GenBank accession numbers of the 341,011 RcGTA portal protein homologs that are taxonomically assigned to bacteria, archaea, or viruses and had a top-scoring hit to a portal profile over the length criteria stated in Section 7.2.1. <br><br><b>288836_acc.zip: </b>GenBank accession numbers of the 288,836 RcGTA portal protein homologs that are represented by 14,745 portals on the phylogenetic tree in Figure 7.2.<br><br><b>14745_tree_acc.zip: </b>GenBank accession numbers of 14,745 amino acid sequences used for reconstruction of the tree shown in Figure 7.2.<br><b>terminase_alignment.zip: </b>Alignment of 11,051 amino acid sequences of TerLs. The alignment is in FASTA format.<br><br><b>terminase_alignment.trimmed.zip:</b> Alignment of 11,051 amino acid sequences of TerLs, trimmed to remove all sites with more than 50% gaps and less than 10% amino acid similarity. The alignment is in FASTA format.<br><br><b>g1_alignment.zip: </b>Alignment of 1,260 amino acid sequences found upstream of TerLs in RcGTA-like elements with consensus of alignment as first sequence. The alignment is in FASTA format.<br><br><b>g1_alignment.trimmed.zip: </b>Alignment of 1,260 amino acid sequences found upstream of TerLs in RcGTA-like elements with consensus of alignment as first sequence, trimmed to remove all sites with more than 90% gaps. The alignment is in FASTA format.<br><br><b>portal_alignment.zip: </b>Alignment of 14,745 amino acid sequences of portals. The alignment is in FASTA format.<br><br><b>portal_alignment.trimmed.zip: </b>Alignment of 14,745 amino acid sequences of portals, trimmed to remove all sites with more than 50% gaps and less than 10% amino acid similarity. The alignment is in FASTA format.<br><br><b>Figure5.3_tree.zip: </b>Phylogenetic tree of 11,051 TerLs shown in Figure 5.3 in NEWICK format. aLRT support values are included as node labels.<br><br><b>context_tree.zip: </b>UPGMA tree constructed based on the similarity between 2,937 gene neighborhoods. Tree was used to identify g1-like proteins for initial alignment.<br><br><b>Figure7.2_tree.zip: </b>Phylogenetic tree of 14,745 portals shown in Figure 7.2 in NEWICK format. aLRT support values are included as node labels.<br><br>
提供机构:
figshare
创建时间:
2020-05-21



