five

SFB genomes and annotations

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/3249379
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains sequence files for a Metagenome Assembled Genome (MAG) from human metagenomes, as well as 5 SFB reference genomes: GCF_000270205 Candidatus Arthromitus sp. SFB-mouse-Japan GCF_000283555 Candidatus Arthromitus sp. SFB-rat-Yit GCF_000284435 Candidatus Arthromitus sp. SFB-mouse-Yit GCF_000709435 Candidatus Arthromitus sp. SFB-mouse-NL GCF_001655775 Candidatus Arthromitus sp. SFB-turkey isolate UMNCA01 The dataset consists of 8 gzipped tar archives. Here's brief summary of their contents: sfb_abundance: Counts of mapped reads and normalized counts for each contig in 825 samples (see sfb_map) Files named 'raw_counts' are number of reads assigned to each contig while files named 'tpm' are counts normalized to Transcripts Per Million. The 'percontig' files show numbers per contig while raw_counts.tab and tpm.tab files have counts summed for each genome. sfb_abundance.cds: Counts of mapped reads and normalized counts as above but only for reads mapping to protein-coding regions. sfb_annotations: Annotation files, from running the prokka pipeline on the genomes and subsequently eggnog-mapper, pfam_scan and dbCAN. sfb_checkm: Results from running 'checkm lineage_wf' on the genomes. sfb_collated: Collated counts of annotations in each genome. sfb_fastani: Results from running fastANI on the genomes, with subsequent clustering of genomes based on 75% overlap and 95% ANI. sfb_gtdb: Results from the 'gtdbtk classify_wf' on the genomes. This shows how the genomes are classified against the Genome Taxonomy Database (release86). sfb_gtdb_denovo: Phylogeny as created using the following command on the genomes. gtdbtk de_novo_wf --bac120_ms --outgroup_taxon p__Patescibacteria -x .fna --cpus 20 --rnd_seed 123 sfb_map: Results from mapping reads from 825 samples to the 6 genomes. Reads were aligned using bowtie2 with '--very-sensitive --no-unal' settings and '--score-min C,0,0' to only report reads aligning without mismatches.Output was sorted by position and duplicates removed using MarkDuplicates of the picard tools suite. The archive contains a single merged bam file ('sfb.bam') where each sample has been assigned a ReadGroup inferred from its file name. Note that this mapping step was performed to investigate the presence of the SFB MAG in other metagenomes and was not part of the actual binning step.
创建时间:
2020-11-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作