five

Sequence alignment for phylogenetic tree construction

收藏
Figshare2018-12-27 更新2026-04-08 收录
下载链接:
https://figshare.com/articles/Alignment_for_tree_of_life/6916298/2
下载链接
链接失效反馈
官方服务:
资源简介:
This alignment was used to build a tree with our MAGs, all taxa previously identified by Burgess et al. (2012) with complete genomes available on NCBI (downloaded 2017-09-06), and all archaeal and bacterial genomes previously used in Hug et al. (2016). The genomes used in this tree and a mapping file can be found on figshare.<br>(genomes in Hug et al.’s tree of life (2016): https://doi.org/10.6084/m9.figshare.6863594.v1, https://doi.org/10.6084/m9.figshare.6863744.v2, https://doi.org/10.6084/m9.figshare.6863813.v1; genomes from Burgess et al. (2012): https://doi.org/10.6084/m9.figshare.6863798.v1).<br> <br>PhyloSift builds an alignment of the concatenated sequences for a set of core markers for each taxon. We used 37 of these single-copy marker genes (ribosomal protein S2 rpsB, S10 rpsJ, L1 rplA, L22, L4/L1e rplD, L2 rplB, S9 rpsl, L3 rplC, L14b/L23e rplN, S5, S19 rpsS, S7, L16/L10E rplP, S13 rpsM, L15, L25/L23, L6 rplF, L11 rplK, L5 rplE, S12/S23, L29, S3 rpsC, S11 rpsK, L10, S8, L18P/L5E, S15P/S13e, S17, S13 rplM, L24; and translation initiation factor IF-2, metalloendopeptidase, phenylalanyl-tRNA synthetase beta subunit, phenylalanyl-tRNA synthetase alpha subunit, tRNA pseudouridine synthase B, Porphobilinogen deaminase, and ribonuclease HII; i.e., PhyloSift markers DNGNGWU00001 - DNGNGWU00040 without DNGNGWU00004, DNGNGWU00008 and DNGNGWU00038). The amino acid alignment of these 37 concatenated genes was trimmed using trimAl v.1.2. Columns with gaps in more than 5% of the sequences were removed, as well as taxa with with less than 75% of the concatenated sequences. MAGs from ARK and ZAV that did not meet this threshold were manually kept in the alignment.

本序列比对用于构建包含本研究宏基因组组装基因组(Metagenome-Assembled Genomes, MAGs)、此前由伯吉斯等人(Burgess et al.,2012)鉴定且完整基因组可于美国国家生物技术信息中心(National Center for Biotechnology Information, NCBI)获取(下载日期:2017-09-06)的所有分类单元,以及此前胡格等人(Hug et al.,2016)研究中使用的全部古菌与细菌基因组的系统发育树。 本研究建树所用的基因组及映射文件可于Figshare平台获取:胡格等人2016年生命之树所用基因组:https://doi.org/10.6084/m9.figshare.6863594.v1、https://doi.org/10.6084/m9.figshare.6863744.v2、https://doi.org/10.6084/m9.figshare.6863813.v1;伯吉斯等人2012年所用基因组:https://doi.org/10.6084/m9.figshare.6863798.v1。 PhyloSift工具可针对每个分类单元的一套核心标记基因的拼接序列构建比对。本研究选取其中37个单拷贝标记基因:核糖体蛋白S2(rpsB)、S10(rpsJ)、L1(rplA)、L22、L4/L1e(rplD)、L2(rplB)、S9(rpsl)、L3(rplC)、L14b/L23e(rplN)、S5、S19(rpsS)、S7、L16/L10E(rplP)、S13(rpsM)、L15、L25/L23、L6(rplF)、L11(rplK)、L5(rplE)、S12/S23、L29、S3(rpsC)、S11(rpsK)、L10、S8、L18P/L5E、S15P/S13e、S17、S13(rplM)、L24;以及翻译起始因子IF-2、金属内肽酶、苯丙氨酰-tRNA合成酶β亚基、苯丙氨酰-tRNA合成酶α亚基、tRNA假尿苷合酶B、胆色素原脱氨酶以及核糖核酸酶HII;即PhyloSift标记DNGNGWU00001至DNGNGWU00040,排除DNGNGWU00004、DNGNGWU00008与DNGNGWU00038。 我们采用trimAl v.1.2工具对这37个拼接基因的氨基酸序列比对结果进行修剪:移除超过5%的序列存在缺失位点的列,同时移除拼接序列覆盖度低于75%的分类单元。但ARK与ZAV来源的未达该阈值的宏基因组组装基因组,均被手动保留于序列比对中。
创建时间:
2018-12-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作