five

Supplementary dataset for The sociality of pathogenicity and virulence throughout Human pathogen diversity

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5089776
下载链接
链接失效反馈
官方服务:
资源简介:
This supplementary dataset provides the list of species included in each analysis and corresponding representative strain genomes used. There are two lists of species: one used for the comparative analysis of virulence factors and pathogenicity, and one used for the case fatality rate comparative analysis. For each set, the resulting annotations are provided in tab-delimited tables format, for both gene-level annotations, and summarised at species-level. These datasets are directly usable to replicate our analysis, using codes provided at https://github.com/CamilleAnna/CooperativePathogenicityVirulence_repo.git. 1.1_pathogen_commensal_genomes_118.txt: list of pathogen and non-pathogen species and corresponding representative strain genome used in comparative analysis of virulence factors and pathogenicity. Fields: species_id = species/strain id in MIDAS database genome_name = corresponding genome name in PATRIC database, genome_id = corresponding genome id in PATRIC database is_rep_genome = is genome representative strain in MIDAS database count_genomes = number of genomes for this species in MIDAS database genus = genus for species species = species name for plotting pathogen = binary variable of whether species was classified as pathogen or non-pathogen gram_profile = gram profile assigned to species (used for specifying gram profile in PSORTb pipeline), with p = gram-positive, n = gram-negative, OM+ = gram-positive with outer membrane.   2.2_assembled_SPECIES_annotation.txt: assembled dataset for species included in comparative analysis of pathogenicity (n = 118 species). Fields 1 to 3: genome used for analysis, with: species_id = species/strain id in MIDAS database genus = genus for species species = species name for plotting. Fields 4 to 13: genome annotations, with gram_profile = species gram staining profile total_cds = proteome size is_victor_vf = number of CDS annotated as virulence factors in VICTOR database pathogen = pathogenic status other fields indicate the number of genes for this species annotated with each of the six forms of cooperation. 2.3_assembled_GENES_annotation.txt: gene-level annotations that were used to assemble the table “2.2_assembled_SPECIES_annotation.txt”. This dataset was used for the comparative analysis of virulence factors (n = 118 species, nrows = 367162 genes) total. Each CDS if flagged “1” if it was annotated with the 6 forms of cooperation, and whether it was recorded as a virulence factor in VICTOR database. “peg” and “product_patric” CDS name and annotation in PATRIC database.   3.2_cfr_SUPFAM_match: list of pathogen and non-pathogen species and corresponding representative strain genome used in comparative analysis of virulence factors and pathogenicity. Fields: Legget_species: species name in Legget et al (2017) supplementary data table matching_supfam_id: corresponding strain code in SCOP database genome_name: corresponding genome name in PATRIC database genome_id: corresponding genome id in PATRIC database file_name_record: internal naming for annotations gram_profile: gram profile assigned to species (used for specifying gram profile in PSORTb pipeline), with p = gram positive, n = gram-negative, OM+ = gram-positive with outer membrane.   4.1_assembled_CFR_SPECIES_annotation.txt: assembled dataset for species included in comparative analysis of case fatality rate (n = 50 species). Fields 1 to 10 are ecological data about species provided in Leggett et al (2017) supplementary tables. Fields 11 to 13: genome used for analysis, with: species_id = internal naming genome_id = corresponding PATRIC genome id matching_supfam_id = corresponding strain code in SCOP database Fields 14 to 22: genome annotations, with: gram_profile = species gram staining profile total_cds = proteome size is_victor_vf = number of CDS annotated as virulence factors in VICTOR database other fields indicate number of genes for this species annotated with each of the six forms of cooperation.   4.2_assembled_CFR_GENES_annotation.txt: gene-level annotations that was used to assemble the table “4.2_assembled_CFR_SPECIES_annotation.txt”, n = 50 species, nrows = 180754 genes total. Each CDS if flagged “1” if it was annotated with the 6 forms of cooperation, and whether it was recorded as a virulence factor in VICTOR database. “peg” and “product_patric” CDS name and annotation in PATRIC database.   midas_tree_renamed.newick: phylogeny used for comparative analyses of virulence factors and pathogenicity. We used the MIDAS phylogeny of PATRIC genomes, which we trimmed to our focus species (N = 118) and ultrametricised using chronopl function in ape package.   Supfam_cfr_tree.newick: phylogeny used for comparative analyses of virulence factors and pathogenicity. We used the SCOP database generated phylogeny which we ultrametricised using chronopl function in ape package.
创建时间:
2021-09-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作