The results of the comparative taxonomic and genomic analysis of the viromes from Lake Baikal and other freshwater bodies
收藏DataCite Commons2025-06-01 更新2024-07-28 收录
下载链接:
https://figshare.com/articles/dataset/The_results_of_the_comparative_taxonomic_and_genomic_analysis_of_the_viromes_from_Lake_Baikal_and_other_freshwater_bodies/12814637/1
下载链接
链接失效反馈官方服务:
资源简介:
The dataset contains the tables and files that demonstrate the results of a comparative genomic, taxonomic and functional analysis of viral communities from Lake Baikal and other freshwater lakes [1-8]. Taxonomic identification of the sequences (metagenomic reads) was carried out using the BLASTn [9] and DIAMOND [10] programs against the NCBI RefSeq complete viral genome and proteome database [11]. De novo assembly was carried out using SPAdes 3.13.0 metagenomics assembler, metaSPAdes [12]. The «VirSorter» tool [13] was used for identifying the viral scaffolds and viral proteins. Functional annotation of viral proteins in the viromes of Lake Baikal was carried out using COG (Clusters of Orthologous Groups) [14] and KEGG pathway [15] classification groups. For taxonomic analysis, comparisons of DNA reads with complete viral genomes using the BLASTn program were carried out on five high performance nodes Intel Xeon E5-2695 v4 "Broadwell" CPU (2 CPUs, 36 cores total, with 128 Gb RAM per node), total calculation time ~ 24 hours. Comparisons of DNA reads with complete viral proteomes using the DIAMOND program were carried out on five high performance nodes Intel Xeon E5-2695 v4 "Broadwell" CPU (2 CPUs, 36 cores total, with 128 Gb RAM per node), total calculation time ~ 3 hours. The paired reads assembly was performed using AMD Opteron 6278 (8 CPU, 64 cores total), 945 Gb RAM, total assembling time ~ 399 hours.<br><b>Full list of Supplementary Materials</b><b><br></b><b>Figures</b>Figure S1: A complete scheme of bioinformatic analysis. The stages of the analysis are highlighted with red boxes, the resulting datasets are blue with a dashed stroke, and the databases (DB) used are purple.<br>Figure S2: Dominated viral families in the investigated Baikal viromes. The percentages greater than one percent are shown in the diagrams.<br>Figure S3: The general functional annotation of the Baikal virome datasets using the COG (A) and KEGG pathway (B) databases.<br>Figure S4: The phyla of the hosts predicted for revealed Baikal viruses using the Virus-Host database (A) and the VirHostMatcher-Net software (B).<br><b>Tables</b><br>Table S1: The list and taxonomy of virotypes revealed in analyzed freshwater viromes (initial number of reads per virotype).<br>Table S2: The list and taxonomy of virotypes revealed in analyzed freshwater viromes (number of reads per virotype normalized to genome length).<br>Table S3. The list and taxonomy of virotypes revealed in analyzed freshwater viromes (number of reads per virotype normalized to genome length for 95% dominant pool).<br>Table S4: Taxonomic identification of the Baikal viral scaffolds, the similarity of detected ORFs and RefSeq proteins, and the number of virome reads per scaffold.<br>Table S5: Main and secondary KO (KEGG Orthology) functional categories of predicted viral proteins and the number of reads related to these functions in the Baikal samples.<br>Table S6: COG (Clusters of Orthologous Groups) classification of viral proteins revealed in the Baikal viromes (the columns indicate the number of reads per COG functional category for each sample).<br>Table S7: The predicted viral proteins or enzymes identified with the COG database in the Baikal viromes (the columns indicate the number of reads for each protein in the samples).<br>Table S8: The auxiliary metabolic genes (AMGs) revealed among predicted proteins in the Baikal viromes.<br>Table S9: The known hosts for the Baikal virotypes identified using the Virus-Host database.<br>Table S10: The list of the hosts predicted for the revealed Baikal viruses using the VirHostMatcher-Net software.<br><b>Files</b><br>File S1: Scaffolds obtained using the metaSPAdes software.<br>File S2: Viral scaffolds identified with the VirSorter tool.<br>File S3: Viral proteins identified with the VirSorter tool.<br><b>References</b><br>1. Butina, T. V.; Bukin, Y. S.; Krasnopeev, A. S.; Belykh, O. I.; Tupikin, A. E.; Kabilov, M. R.; Sakirko, V.; Belikov, S. I. Estimate of the diversity of viral and bacterial assemblage in the coastal water of Lake Baikal. FEMS Microbiol. Lett. 2019, 366, fnz094.<br>2. Potapov, S. A.; Tikhonova, I. V.; Krasnopeev, A. Y.; Kabilov, M. R.; Tupikin, A. E.; Chebunina, N. S.; Zhuchenko, N. A.; Belykh, O. I. Metagenomic analysis of virioplankton from the pelagic zone of lake baikal. Viruses 2019, 11, 991.<br>3. Watkins, S. C.; Kuehnle, N.; Ruggeri, C. A.; Malki, K.; Bruder, K.; Elayyan, J.; Damisch, K.; Vahora, N.; O’Malley, P.; Ruggles-Sage, B.; et al. Assessment of a metaviromic dataset generated from nearshore Lake Michigan. Mar. Freshw. Res. 2016, 67, 1700–1708.<br>4. Mohiuddin, M.; Schellhorn, H. E. Spatial and temporal dynamics of virus occurrence in two freshwater lakes captured through metagenomic analysis. Front. Microbiol. 2015, 6, 960.<br>5. Skvortsov, T.; De Leeuwe, C.; Quinn, J. P.; McGrath, J. W.; Allen, C. C. R.; McElarney, Y.; Watson, C.; Arkhipova, K.; Lavigne, R.; Kulakov, L. A. Metagenomic characterisation of the viral community of lough neagh, the largest freshwater lake in Ireland. PLoS One 2016, 11, e0150361.<br>6. Arkhipova, K.; Skvortsov, T.; Quinn, J. P.; McGrath, J. W.; Allen, C. C. R.; Dutilh, B. E.; McElarney, Y.; Kulakov, L. A. Temporal dynamics of uncultured viruses: A new dimension in viral diversity. ISME J. 2018, 12, 199–211.<br>7. Moon, K.; Kang, I.; Kim, S.; Kim, S. J.; Cho, J. C. Genome characteristics and environmental distribution of the first phage that infects the LD28 clade, a freshwater methylotrophic bacterial group. Environ. Microbiol. 2017, 19, 4714–4727.<br>8. Okazaki, Y.; Nishimura, Y.; Yoshida, T.; Ogata, H.; Nakano, S. ichi Genome-resolved viral and cellular metagenomes revealed potential key virus-host interactions in a deep freshwater lake. Environ. Microbiol. 2019, 21, 4740–4754.<br>9. Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 1990, 215, 403–410.<br>10. Buchfink, B.; Xie, C.; Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 2014, 12, 59–60.<br>11. Pruitt, K. D.; Tatusova, T.; Maglott, D. R. NCBI Reference Sequence (RefSeq): A curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2005, 33, D501-D504.<br>12. Nurk, S.; Meleshko, D.; Korobeynikov, A.; Pevzner, P. A. MetaSPAdes: A new versatile metagenomic assembler. Genome Res. 2017, 27, 824–834.<br>13. Roux, S.; Enault, F.; Hurwitz, B. L.; Sullivan, M. B. VirSorter: Mining viral signal from microbial genomic data. PeerJ 2015, 3, e985.<br><br><br>
提供机构:
figshare
创建时间:
2021-04-03



