five

Supplementary data for Esterman et al (2020)

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/Supplementary_data_for_Esterman_et_al_2020_/12191691
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains supplementary data for the bioinformatic analyses presented in: Emma Esterman, Yuri I. Wolf, Roman Kogay, Eugene V. Koonin, and Olga Zhaxybayeva, “Phylogenetic evidence of headful packaging strategy in gene transfer agents” (under review) File Descriptions: 254382_acc.zip: GenBank accession numbers of the RcGTA TerL protein homologs that are taxonomically assigned to bacteria, archaea, or viruses and likely include both an ATPase (N terminal) and nuclease (C terminal) domains. 11051_tree_acc.zip: GenBank accession numbers of amino acid sequences used for reconstruction of the tree shown in Figure 1. 252614_acc.zip: GenBank accession numbers of the RcGTA TerL protein homologs that are represented by 11,051 TerLs on the phylogenetic tree. terminase_alignment.zip: Alignment of 11,051 amino acid sequences of terminases. The alignment is in FASTA format. terminase_alignment.trimmed.zip: Alignment of 11,051 amino acid sequences of TerLs, trimmed to remove all sites with more than 50% gaps and less than 10% amino acid similarity. The alignment is in FASTA format. Figure1_tree.zip: Phylogenetic tree of 11,051 terLs shown in Figure 1 in NEWICK format. aLRT support values are included as node labels. terL_subtree_aligned.zip: Alignment of 616 amino acid sequences of TerLs used to detect sites differentiating viruses and RcGTA-like proteins. The 616 TerLs are from the subtree shown on Figure 2. The alignment is in FASTA format. IQ-Tree_Alignments.zip: Alignments of 342 and 346 amino acid sequences of TerLs used to validate obtained phylogenetic patterns via IQ-TREE. The alignments are in FASTA format. IQ-TREE_trees.zip: Phylogenetic trees of 342 and 346 TerLs reconstructed in IQ-TREE. The trees are in NEWICK format. Ultrafast bootstrap support values are included as node labels.

本数据集为以下研究中的生物信息学分析提供补充数据:Emma Esterman、Yuri I. Wolf、Roman Kogay、Eugene V. Koonin及Olga Zhaxybayeva,《基因转移Agent(Gene Transfer Agent, GTA)中头部包装策略的系统发育证据》(已投稿待审)。 文件说明: 254382_acc.zip:分类学上归属于细菌、古菌或病毒的RcGTA末端酶大亚基(TerL)同源蛋白的GenBank登录号,该类同源蛋白大概率同时具备ATP酶结构域(N末端)与核酸酶结构域(C末端)。 11051_tree_acc.zip:用于构建图1所示系统发育树的氨基酸序列的GenBank登录号。 252614_acc.zip:对应系统发育树上11051个TerL的RcGTA TerL同源蛋白的GenBank登录号。 terminase_alignment.zip:11051条末端酶(terminase)氨基酸序列的多序列比对文件,比对格式为FASTA。 terminase_alignment.trimmed.zip:经修剪后的11051条TerL氨基酸序列的多序列比对文件,其修剪规则为移除所有空位占比超过50%且氨基酸相似度低于10%的位点,比对格式为FASTA。 Figure1_tree.zip:图1所示的11051个TerL的系统发育树,文件格式为NEWICK格式,节点标签包含近似似然比检验(approximate likelihood ratio test, aLRT)支持值。 terL_subtree_aligned.zip:用于鉴定区分病毒与类RcGTA蛋白特征位点的616条TerL氨基酸序列的多序列比对文件,该616个TerL均来源于图2所示的系统发育子树,比对格式为FASTA。 IQ-TREE_Alignments.zip:用于通过IQ-TREE验证所得系统发育模式的342条与346条TerL氨基酸序列的多序列比对文件,比对格式为FASTA。 IQ-TREE_trees.zip:通过IQ-TREE构建的342条与346条TerL的系统发育树,文件格式为NEWICK格式,节点标签包含超快速自举支持值。
创建时间:
2020-10-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作