S1 - Genome metadata and metabolic analyses for existing and newly extracted Rickettsia and Megaira symbiotic bacteria
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://figshare.com/articles/dataset/S1_-_Genome_metadata_and_metabolic_analyses_for_existing_and_newly_extracted_Rickettsia_and_Megaira_symbiotic_bacteria/14865561
下载链接
链接失效反馈官方服务:
资源简介:
What is it
An excel spread sheet containing all metadata associated
with new and existing Rickettsia, Torix Rickettsia (Ca. Tisiphia), Ca. Megaira
and Orientia genomes used in pangenomic, metabolic and phylogenomic
analyses in the paper "Genomic diversity in the genus Rickettsia and its
sister groups indicate the torix group are evolutionarily distinct"
The tables are split into genome data (S1a) and metabolic
data (S1b). Each sheet contains the information as follows:
S1a – yellow tabs, genome and gene sequence
metadata
- > S1a.1 – a list of published genomes. Includes
accession numbers, which rickettsia group they belong to, checkM quality check
scores, metadata on assembly level, and which analyses they were used in.
-> S1a.2 – Metadata for new genomes
assembles in this study. Includes: rickettsia group, assembly level, checkM
completeness scores, genome size and N50 scores, Host names, host accessions
and brief details of host ecology.
->
S1a.3 – Accession and host species for
all Rickettsial 16S rRNA, COI, gltA and 17kDa sequences used in MLST to
identify the Rhyzobius (Oopac6), Meloidae (Ppec13), Moomin (Moomin) and unusual
Belli (Dallo3) genomes extracted from Short Read Archive data.
-> S1a.4 – GTDB-Tk taxonomy classification which supports assiggning torix group rickettsia in its own clade.
S1b – red tabs, pangenome and metabolic
data
-> S1b.1 – The 74 core gene clusters with
COG and KEGG/kofam functions across 104 genomes from a pangenome of Rickettsia,
Megaira and Orientia constructed with Anvi’o. PHI scores are given for each
cluster. These gene clusters were later used in phylogeny construction.
-> S1b.2 – The 43 ribosomal protein gene
clusters with COG and KEGG/kofam functions across 104 genomes from a pangenome
of Rickettsia, Megaira and Orientia constructed with Anvi’o 7. These gene
clusters were later used in phylogeny construction.
-> S1b.3 – Raw metabolic pathway completion
scores for Rickettsia, Megaira and Orientia genomes used in metabolic heatmaps
in the main paper. Contains the KEGG module names, classes, categories, and
estimated completion scores produced through `anvi-estimate-metabolism` in Anvi’o
7.
-> S1b.4 – Functional enrichment scores
produced with `anvi-get-enriched-functions-per-pan-group` in Anvi’o 7.
Comparisons are made within Torix, within the main Rickettsia (excluding
Torix), between Torix and Megaira, and between Torix and Rickettsia.
-> S1b.5 – Average nucleotide identity (ANI) scores with percentage similarity between each genome produced with pyANI
-> S1b.6 – Average Amino acid identity (AAI) scores with percentage similarity between each genome produced with kosta lab AAI matrix tools.
##> S1b.7 Data from gene cluster content analysis used to construct accumulation curves and cluster similarity plots
Why is it
Background data that is necessary for transparency but is
unsuitable for inclusion in a paper as information tables.
Use and benefit
Useful metadata and raw data that can be examined by readers
and allows for reproducibility.
创建时间:
2021-10-05



