Training data for 'Genome annotation with Funannotate' tutorial (Galaxy Training Material)
收藏Mendeley Data2024-05-17 更新2024-06-27 收录
下载链接:
https://zenodo.org/records/7867921
下载链接
链接失效反馈官方服务:
资源简介:
The data provided here are part of a Galaxy Training Network tutorial for genome annotation with funannotate. Genome was assembled following the GTN Flye assembly tutorial, then masked with RepeatMasker. RNASeq data: SRR8534859 reads were mapped to the genome using STAR (toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.8a+galaxy0), then the bam was downsampled (10% with toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_DownsampleSam/2.18.2.1) to reduce the size of the dataset. Fastq files were then extracted from the resulting bam file (toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_SamToFastq/2.18.2.1). SwissProt_subset.fasta is a subset of SwissProt proteins that are known to have some similarity with the genome (found using Diamond against the genome, then extracting sequences matching with e-value < 0.0001).
本数据集为使用funannotate进行基因组注释的Galaxy训练网络(Galaxy Training Network, GTN)教程的一部分配套数据。所使用的参考基因组按照GTN Flye组装教程完成序列组装,随后通过RepeatMasker进行重复序列屏蔽。转录组测序数据方面,使用STAR工具(toolshed.g2.bx.psu.edu/repos/iuc/rgrnastar/rna_star/2.7.8a+galaxy0)将SRR8534859的测序reads比对至参考基因组;随后通过Picard的DownsampleSam工具(toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_DownsampleSam/2.18.2.1)将生成的BAM比对文件按10%比例进行降采样,以缩减数据集整体体积。随后通过Picard的SamToFastq工具(toolshed.g2.bx.psu.edu/repos/devteam/picard/picard_SamToFastq/2.18.2.1)从降采样后的BAM文件中提取得到FASTQ格式测序文件。SwissProt_subset.fasta为SwissProt蛋白数据库的子集,其收录的蛋白均被证实与该参考基因组存在一定序列相似性:通过Diamond工具对参考基因组进行序列比对后,提取出E值小于0.0001的匹配序列,最终得到该子集文件。
创建时间:
2023-06-28



