five

Comparative genomics of human distal lung Streptococci

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10220078
下载链接
链接失效反馈
官方服务:
资源简介:
Comparative genomics of human lung streptococcal isolates 1. All analysis pipelines and scripts are on the GitHub page of Slipa Kanungo: https://github.com/slipa17/Whole-genome-sequencing-and-comparative-genomics-of-human-lung-streptococcal-isolates 2. Additional analysis pipelines (especially Dataset S12) are on the GitHub page of Garance Sarton-Lohéac: https://github.com/gsartonl/Publication_Sarton-Loheac_2022 3. All raw data were uploaded to NCBI SRA BioProject PRJNA1001255 Supplementary Table bundle: for peer review purposes Supplementary Datasets Dataset S1_Lung_Streptococcus_genomes_metaQUAST: MetaQUAST (Quality Assessment Tool for Metagenome Assemblies) output including HTML and PDF reports, summary statistics including total contigs, assembly size, and N50. Coverage analysis assesses how well reference genomes are represented, contig length distribution plots visualize contig length ranges, mis-assembly analysis detects potential errors and graphical representations to visualize assemblies. Dataset S2_Lung_streptococcus_isolate_genomes: Nucleotide FASTA files of six lung streptococcal isolates obtained that were obtained via whole genome sequencing. Dataset S3_Lung_isolates_genome_annotation_prokka: Output folders after annotation of six lung streptococcal isolates with PROKKA. This includes protein FASTA, GenBank files and GFF annotations. Dataset S4_TYGS_dDDH_analysis: Contains results of TYGS analysis from DSMZ including downloadable reports.  Outputs including taxonomic identification with genus, species, and strain details, a TYGS index for tracking genomes, genome quality assessment metrics, GBDP whole genome and 16S rRNA phylogenetic tree files, comparisons with reference type strains in the TYGS database with table. Dataset S5_Reference_type_strains_TYGS_genomes: Nucleotide FASTA files of 47 closely related reference Streptococcus genomes listed by TYGS and downloaded from NCBI. Dataset S6_Reference_type_strains_TYGS_proteins: Protein FASTA files of 47 closely related reference Streptococcus genomes listed by TYGS and downloaded from NCBI.  Dataset S7_Lung_streptococcus_isolate_proteins: Protein FASTA files of 6 six lung streptococcal isolates.  Dataset S8_OrthoFinder_core_genome: OrthoFinder is a bioinformatics tool that offers comprehensive outputs for orthology inference across multiple genomes. The output includes overall statistics, gene duplication information, orthologous genes, orthologous gene tree, single copy orthologous genes and STAG evolutionary trees. Dataset S9_Pan-Strep_BLAST_db: BLAST database using the makeblastdb command of NCBI datasets command line tool. This is constructed using protein FASTA files of 47 closely related reference Streptococcus genomes listed by TYGS and downloaded from NCBI. Dataset S10_OrthoVenn_cluster_files: OrthoVenn is a web-based tool for orthologous gene comparison. Downloadable results include Venn diagrams depicting shared and unique orthologous clusters amongst species, tabular results detailing genes within each cluster and their annotations. Functional enrichment analysis for Gene Ontology terms and KEGG pathways are also provided. enhances biological insights. Dataset S11_COG_analysis: Results of COG analysis of six lung streptococcal isolates individually using eggNOG (evolutionary genealogy of genes: Non-supervised Orthologous Groups) webtool. The output includes information on Clusters of Orthologous Groups (COGs) categorizing them into functional groups such as metabolism, information storage and processing, and cellular processes and signalling. Dataset S12_CAZymes_lung_streptococci: Results of CAZyme analysis using a custom rule-based pipeline mostly based on dbCAN (Database for Carbohydrate-Active enZymes) provides information on the carbohydrate-active enzymes present in genomic datasets. The output includes the annotation of enzymes involved in the degradation, modification, or biosynthesis of carbohydrates: glycoside hydrolases (GH), glycosyltransferases (GT), carbohydrate-binding modules (CBM), Auxillary Activities (AA), Carbohydrates Esterases (CE) and Polysaccharide lyases (PL). Dataset S13_pneumolysin_analysis: Results alignment and phylogeny of Pnuemolysin protein in Streptococcus pneumoniae, Streptococcus pseudopneumoniae and Streptococcus isolate P2E5 found by ABRIcate analysis. Visual plots by pyGenomeViz. Dataset S14_capsule_analysis: Results from BLAST analysis of Streptococcus pneumoniae D39 capsular biosynthesis operon genes against the Pan-Strep (Dataset S10). Extracted of matching genes followed alignment and phylogeny. Visual plots by pyGenomeViz. Dataset S15_Lung_isolate_HOMD_TYGS_comparison: Protein FASTA files of 47 closely related reference Streptococcus genomes listed by TYGS and downloaded from NCBI, 6 six lung streptococcal isolates and 47 streptococcal genomes from downloaded from human oral microbiome database (eHOMD).
创建时间:
2023-12-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作