CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes: supplemental material
收藏Research Data Australia2024-12-14 收录
下载链接:
https://researchdata.edu.au/checkm-assessing-quality-supplemental-material/3368502
下载链接
链接失效反馈官方服务:
资源简介:
Supplementary Results
Refinement for Gene Loss and Duplication
Estimates under Opal Stop Codon Recodings
Supplementary Methods
Identification of Trusted Reference Genomes
Refining Marker Sets for Lineage-specific Gene Loss and Duplication
Determination of Coding Table
Systematic Bias of Completeness and Contamination Estimates
Supplemental Figure S1. Distribution of the 104 bacterial and 281 gammaproteobacterial marker genes around the E. coli K12 genome.
Supplemental Figure S2. Error in completeness and contamination estimates on simulated genomes with varying levels of completeness and contamination generated under the random contig model.
Supplemental Figure S3. Error in completeness and contamination estimates on simulated genomes with varying levels of completeness and contamination generated under the inverse length model.
Supplemental Figure S4. Maximum-likelihood genome tree inferred from 5656 reference genomes.
Supplemental Figure S5. Error in completeness and contamination estimates on simulated genomes with varying levels of completeness and contamination generated under the random fragment model using a window size of 20 kbp.
Supplemental Figure S6. Error in completeness and contamination estimates on simulated genomes with varying levels of completeness and contamination generated under the inverse length model.
Supplemental Figure S7. Error in completeness and contamination estimates on simulated genomes from different phyla.
Supplemental Figure S8. Bias in completeness and contamination estimates when modelled as a binomial distribution.
Supplemental Figure S9. GC-distribution plots of the HMP Capnocytophaga sp. oral taxon 329 genome.
Supplemental Figure S10. Phylogenetic placement of the two genomes (Cluster 0 and Cluster 1) identified within the HMP Capnocytophaga sp. oral taxon 329 genome.
Supplemental Figure S11. Completeness estimates for 90 putative population genomes recovered from an acetate-amended aquifer.
Supplemental Figure S12. Contamination estimates for 90 putative population genomes recovered from an acetate-amended aquifer.
Supplemental Figure S13. Identification of the 213 marker genes within the Meyerdierks et al. (2010) ANME-1 genome.
Supplemental Figure S14. Refining a marker set for lineage-specific gene loss and duplication.
Supplemental Tables
Supplemental Table S1. Mean absolute error of completeness (comp.) and contamination (cont.) estimates determined using different universal- and domain-specific marker gene sets.
Supplemental Table S2. Number of marker genes and marker sets for taxonomic groups with ≥ 20 reference genomes.
Supplemental Table S3. Mean absolute error of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker genes treated individually (IM) or organized into collocated marker sets (MS).
Supplemental Table S4. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker genes treated individually (IM) or organized into collocated marker sets (MS).
Supplemental Table S5. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker genes treated individually (IM) or organized into collocated marker sets (MS).
Supplemental Table S6. Phylogenetically informative marker genes used to infer the reference genome tree
along with matching PhyloSift genes.
Supplemental Table S7. Phylogenetically informative genes used in PhyloSift without a matching CheckM gene.
Supplemental Table S8. Mean absolute error of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms), the lineage-specific marker set selected by CheckM (sms), and the best performing lineage-specific marker set (bms).
Supplemental Table S9. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms), the lineage-specific marker set selected by CheckM (sms), and the best performing lineage-specific marker set (bms).
Supplemental Table S10. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms), the lineage-specific marker set selected by CheckM (sms), and the best performing lineage-specific marker set (bms).
Supplemental Table S11. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms) and the lineage-specific marker set selected by CheckM (sms).
Supplemental Table S12. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms) and the lineage-specific marker sets selected by CheckM (sms).
Supplemental Table S13. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates determined using domain-specific marker sets (dms) and the lineage-specific marker sets selected by CheckM (sms).
Supplemental Table S14. Taxonomic rank of the selected lineage-specific marker set used for evaluating the quality of genomes at different degrees of taxonomic novelty.
Supplemental Table S15. Mean absolute error and standard deviation of completeness (comp.) and contamination (cont.) estimates for simulated genomes at different degrees of taxonomic novelty.
Supplemental Table S16. Lineage-specific completeness and contamination estimates for isolate genomes from large-scale sequencing initiatives.
(see Excel file)
Supplemental Table S17. Completeness and contamination estimates of the Lactobacillus gasseri MV-22 genome for increasingly basal lineage-specific marker sets.
Supplemental Table S18. Bacterial marker genes identified within the HMP Lactobacillus gasseri genomes. Markers missing from a genome or present in multiple copies are highlighted with a grey background.
Supplemental Table S19. Lineage-specific completeness and contamination estimates for genomes annotated as finished at IMG, along with predicted translation tables and calculated coding density. (see Excel file)
Supplemental Table S20: Lineage-specific completeness and contamination estimates for single-cell genomes from the GEBA-MDM initiative along with traditional assembly statistics. (see Excel file)
Supplemental Table S21: Lineage-specific completeness and contamination estimates for population genomes, plasmids, and phage recovered from metagenomic datasets along with traditional assembly statistics. (see Excel file)
Supplemental Table S22: Completeness and contamination estimates for population genomes recovered from an acetate-amended aquifer determined using domain-level and lineage-specific marker sets. (see Excel file)
提供机构:
The University of Queensland



