Supplementary dataset for The sociality of pathogenicity and virulence throughout Human pathogen diversity
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5089776
下载链接
链接失效反馈官方服务:
资源简介:
This supplementary dataset provides the list of species included in each analysis and corresponding representative strain genomes used. There are two lists of species: one used for the comparative analysis of virulence factors and pathogenicity, and one used for the case fatality rate comparative analysis.
For each set, the resulting annotations are provided in tab-delimited tables format, for both gene-level annotations, and summarised at species-level. These datasets are directly usable to replicate our analysis, using codes provided at https://github.com/CamilleAnna/CooperativePathogenicityVirulence_repo.git.
1.1_pathogen_commensal_genomes_118.txt: list of pathogen and non-pathogen species and corresponding representative strain genome used in comparative analysis of virulence factors and pathogenicity. Fields:
species_id = species/strain id in MIDAS database
genome_name = corresponding genome name in PATRIC database,
genome_id = corresponding genome id in PATRIC database
is_rep_genome = is genome representative strain in MIDAS database
count_genomes = number of genomes for this species in MIDAS database
genus = genus for species
species = species name for plotting
pathogen = binary variable of whether species was classified as pathogen or non-pathogen
gram_profile = gram profile assigned to species (used for specifying gram profile in PSORTb pipeline), with p = gram-positive, n = gram-negative, OM+ = gram-positive with outer membrane.
2.2_assembled_SPECIES_annotation.txt: assembled dataset for species included in comparative analysis of pathogenicity (n = 118 species).
Fields 1 to 3: genome used for analysis, with:
species_id = species/strain id in MIDAS database
genus = genus for species
species = species name for plotting.
Fields 4 to 13: genome annotations, with
gram_profile = species gram staining profile
total_cds = proteome size
is_victor_vf = number of CDS annotated as virulence factors in VICTOR database
pathogen = pathogenic status
other fields indicate the number of genes for this species annotated with each of the six forms of cooperation.
2.3_assembled_GENES_annotation.txt: gene-level annotations that were used to assemble the table “2.2_assembled_SPECIES_annotation.txt”. This dataset was used for the comparative analysis of virulence factors (n = 118 species, nrows = 367162 genes) total. Each CDS if flagged “1” if it was annotated with the 6 forms of cooperation, and whether it was recorded as a virulence factor in VICTOR database. “peg” and “product_patric” CDS name and annotation in PATRIC database.
3.2_cfr_SUPFAM_match: list of pathogen and non-pathogen species and corresponding representative strain genome used in comparative analysis of virulence factors and pathogenicity. Fields:
Legget_species: species name in Legget et al (2017) supplementary data table
matching_supfam_id: corresponding strain code in SCOP database
genome_name: corresponding genome name in PATRIC database
genome_id: corresponding genome id in PATRIC database
file_name_record: internal naming for annotations
gram_profile: gram profile assigned to species (used for specifying gram profile in PSORTb pipeline), with p = gram positive, n = gram-negative, OM+ = gram-positive with outer membrane.
4.1_assembled_CFR_SPECIES_annotation.txt: assembled dataset for species included in comparative analysis of case fatality rate (n = 50 species). Fields 1 to 10 are ecological data about species provided in Leggett et al (2017) supplementary tables.
Fields 11 to 13: genome used for analysis, with:
species_id = internal naming
genome_id = corresponding PATRIC genome id
matching_supfam_id = corresponding strain code in SCOP database
Fields 14 to 22: genome annotations, with:
gram_profile = species gram staining profile
total_cds = proteome size
is_victor_vf = number of CDS annotated as virulence factors in VICTOR database
other fields indicate number of genes for this species annotated with each of the six forms of cooperation.
4.2_assembled_CFR_GENES_annotation.txt: gene-level annotations that was used to assemble the table “4.2_assembled_CFR_SPECIES_annotation.txt”, n = 50 species, nrows = 180754 genes total. Each CDS if flagged “1” if it was annotated with the 6 forms of cooperation, and whether it was recorded as a virulence factor in VICTOR database. “peg” and “product_patric” CDS name and annotation in PATRIC database.
midas_tree_renamed.newick: phylogeny used for comparative analyses of virulence factors and pathogenicity. We used the MIDAS phylogeny of PATRIC genomes, which we trimmed to our focus species (N = 118) and ultrametricised using chronopl function in ape package.
Supfam_cfr_tree.newick: phylogeny used for comparative analyses of virulence factors and pathogenicity. We used the SCOP database generated phylogeny which we ultrametricised using chronopl function in ape package.
创建时间:
2021-09-29



