five

The results of the comparative taxonomic and genomic analysis of the viromes from Lake Baikal and other freshwater bodies

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/The_results_of_the_comparative_taxonomic_and_genomic_analysis_of_the_viromes_from_Lake_Baikal_and_other_freshwater_bodies/12814637
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset contains the tables and files that demonstrate the results of a comparative genomic, taxonomic and functional analysis of viral communities from Lake Baikal and other freshwater lakes [1-8]. Taxonomic identification of the sequences (metagenomic reads) was carried out using the BLASTn [9] and DIAMOND [10] programs against the NCBI RefSeq complete viral genome and proteome database [11]. De novo assembly was carried out using SPAdes 3.13.0 metagenomics assembler, metaSPAdes [12]. The «VirSorter» tool [13] was used for identifying the viral scaffolds and viral proteins. Functional annotation of viral proteins in the viromes of Lake Baikal was carried out using COG (Clusters of Orthologous Groups) [14] and KEGG pathway [15] classification groups. For taxonomic analysis, comparisons of DNA reads with complete viral genomes using the BLASTn program were carried out on five high performance nodes Intel Xeon E5-2695 v4 "Broadwell" CPU (2 CPUs, 36 cores total, with 128 Gb RAM per node), total calculation time ~ 24 hours. Comparisons of DNA reads with complete viral proteomes using the DIAMOND program were carried out on five high performance nodes Intel Xeon E5-2695 v4 "Broadwell" CPU (2 CPUs, 36 cores total, with 128 Gb RAM per node), total calculation time ~ 3 hours. The paired reads assembly was performed using AMD Opteron 6278 (8 CPU, 64 cores total), 945 Gb RAM, total assembling time ~ 399 hours. Full list of Supplementary Materials FiguresFigure S1: A complete scheme of bioinformatic analysis. The stages of the analysis are highlighted with red boxes, the resulting datasets are blue with a dashed stroke, and the databases (DB) used are purple. Figure S2: Dominated viral families in the investigated Baikal viromes. The percentages greater than one percent are shown in the diagrams. Figure S3: The general functional annotation of the Baikal virome datasets using the COG (A) and KEGG pathway (B) databases. Figure S4: The phyla of the hosts predicted for revealed Baikal viruses using the Virus-Host database (A) and the VirHostMatcher-Net software (B). Tables Table S1: The list and taxonomy of virotypes revealed in analyzed freshwater viromes (initial number of reads per virotype). Table S2: The list and taxonomy of virotypes revealed in analyzed freshwater viromes (number of reads per virotype normalized to genome length). Table S3. The list and taxonomy of virotypes revealed in analyzed freshwater viromes (number of reads per virotype normalized to genome length for 95% dominant pool). Table S4: Taxonomic identification of the Baikal viral scaffolds, the similarity of detected ORFs and RefSeq proteins, and the number of virome reads per scaffold. Table S5: Main and secondary KO (KEGG Orthology) functional categories of predicted viral proteins and the number of reads related to these functions in the Baikal samples. Table S6: COG (Clusters of Orthologous Groups) classification of viral proteins revealed in the Baikal viromes (the columns indicate the number of reads per COG functional category for each sample). Table S7: The predicted viral proteins or enzymes identified with the COG database in the Baikal viromes (the columns indicate the number of reads for each protein in the samples). Table S8: The auxiliary metabolic genes (AMGs) revealed among predicted proteins in the Baikal viromes. Table S9: The known hosts for the Baikal virotypes identified using the Virus-Host database. Table S10: The list of the hosts predicted for the revealed Baikal viruses using the VirHostMatcher-Net software. Files File S1: Scaffolds obtained using the metaSPAdes software. File S2: Viral scaffolds identified with the VirSorter tool. File S3: Viral proteins identified with the VirSorter tool. References 1. Butina, T. V.; Bukin, Y. S.; Krasnopeev, A. S.; Belykh, O. I.; Tupikin, A. E.; Kabilov, M. R.; Sakirko, V.; Belikov, S. I. Estimate of the diversity of viral and bacterial assemblage in the coastal water of Lake Baikal. FEMS Microbiol. Lett. 2019, 366, fnz094. 2. Potapov, S. A.; Tikhonova, I. V.; Krasnopeev, A. Y.; Kabilov, M. R.; Tupikin, A. E.; Chebunina, N. S.; Zhuchenko, N. A.; Belykh, O. I. Metagenomic analysis of virioplankton from the pelagic zone of lake baikal. Viruses 2019, 11, 991. 3. Watkins, S. C.; Kuehnle, N.; Ruggeri, C. A.; Malki, K.; Bruder, K.; Elayyan, J.; Damisch, K.; Vahora, N.; O’Malley, P.; Ruggles-Sage, B.; et al. Assessment of a metaviromic dataset generated from nearshore Lake Michigan. Mar. Freshw. Res. 2016, 67, 1700–1708. 4. Mohiuddin, M.; Schellhorn, H. E. Spatial and temporal dynamics of virus occurrence in two freshwater lakes captured through metagenomic analysis. Front. Microbiol. 2015, 6, 960. 5. Skvortsov, T.; De Leeuwe, C.; Quinn, J. P.; McGrath, J. W.; Allen, C. C. R.; McElarney, Y.; Watson, C.; Arkhipova, K.; Lavigne, R.; Kulakov, L. A. Metagenomic characterisation of the viral community of lough neagh, the largest freshwater lake in Ireland. PLoS One 2016, 11, e0150361. 6. Arkhipova, K.; Skvortsov, T.; Quinn, J. P.; McGrath, J. W.; Allen, C. C. R.; Dutilh, B. E.; McElarney, Y.; Kulakov, L. A. Temporal dynamics of uncultured viruses: A new dimension in viral diversity. ISME J. 2018, 12, 199–211. 7. Moon, K.; Kang, I.; Kim, S.; Kim, S. J.; Cho, J. C. Genome characteristics and environmental distribution of the first phage that infects the LD28 clade, a freshwater methylotrophic bacterial group. Environ. Microbiol. 2017, 19, 4714–4727. 8. Okazaki, Y.; Nishimura, Y.; Yoshida, T.; Ogata, H.; Nakano, S. ichi Genome-resolved viral and cellular metagenomes revealed potential key virus-host interactions in a deep freshwater lake. Environ. Microbiol. 2019, 21, 4740–4754. 9. Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410. 10. Buchfink, B.; Xie, C.; Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 2014, 12, 59–60. 11. Pruitt, K. D.; Tatusova, T.; Maglott, D. R. NCBI Reference Sequence (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005, 33, D501-D504. 12. Nurk, S.; Meleshko, D.; Korobeynikov, A.; Pevzner, P. A. MetaSPAdes: A new versatile metagenomic assembler. Genome Res. 2017, 27, 824–834. 13. Roux, S.; Enault, F.; Hurwitz, B. L.; Sullivan, M. B. VirSorter: Mining viral signal from microbial genomic data. PeerJ 2015, 3, e985.
创建时间:
2021-01-31
二维码
社区交流群
二维码
科研交流群
商业服务