Bacterial training dataset for Galaxy training network tutorials on Genome assembly
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://zenodo.org/records/582600
下载链接
链接失效反馈官方服务:
资源简介:
This training dataset is from an imaginary Staphylococcus aureus bacterium with a miniature genome. There is a reference genome in various formats as well as some fastq reads of a closely related but also imaginary mutant strain.
It is a useful dataset for demonstrating:
de novo genome assembly
read mapping and variant calling
genome annotation
The files included are:
wildtype.fna: the reference genome sequence of the wildtype strain in fasta format (a header line, then the nucleotide sequence of the genome.)
wildtype.gff: the reference genome sequence of the wildtype strain in general feature format (a list of features - one feature per line, then the nucleotide sequence of the genome.)
wildtype.gbk: the reference genome sequence in genbank format.
mutant_R1.fastq and mutant_R2.fastq: Fastq sequence reads of a closely related mutant strain.
The reads are paired-end.
Each read is 150 bases long.
The number of bases sequenced is equivalent to 19x the genome sequence of the wildtype strain. (Read coverage 19x - rather low!).
创建时间:
2020-01-24



