Drosophila serrata genome scaffolding and annotation

Name: Drosophila serrata genome scaffolding and annotation
Creator: The University of Queensland
License: 暂无描述

Research Data Australia2024-12-14 收录

下载链接：

https://researchdata.edu.au/drosophila-serrata-genome-scaffolding-annotation/3304703

下载链接

链接失效反馈

官方服务：

资源简介：

Supplementary files required for https://github.com/scottlallen/DserSweepsThe reference genome of D. serrata was created using long-read sequencing technology and has a length of 198 Mbp and a contig N50 of 0.94 Mbp (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_002093755.2/). We subsequently used Dovetail HiRise and Hi-C methods to scaffold those contigs and achieved a scaffold N50 of 30.3 Mb. The six largest scaffolds span 80% of the genome and reach near chromosome-arm level length except for 2L, which is spanned by two large scaffolds of 21Mb and 8.7 Mb. The genome was annotated by NCBI and lifted over to the Hi-C genome. File Descriptions:FASTA Files: drosophila_06Jul2018_A8VGg.fasta : Original Hi-C genome sequence. - drosophila_06Jul2018_A8VGg_noSpecialChar.fasta : Hi-C genome sequence with special characters removed from scaffold names.drosophila_06Jul2018_A8VGg_noSpecialChar_MASKED.fasta : Masked version of the Hi-C genome sequence. drosophila_06Jul2018_A8VGg_noSpecialChar_MASKED_shortName.fasta : Masked version of the Hi-C genome sequence with short scaffold names. - `top6.anc.fa : Hi-C genome sequence of the 6 longest scaffolds specifying the ancestral sequence. GFF Files: GCF_002093755.1_Dser1.0_genomic_OGcontigs_NOregion_HiC_liftOver_sorted.gff : NCBI Annotation file converted to Hi-C scaffolds. GCF_002093755.1_Dser1.0_genomic_OGcontigs_NOregion_HiC_liftOver_sorted_noSpecialChar.gff : Annotation file converted to Hi-C scaffolds with special characters removed from scaffold names. FAA File: GCF_002093755.1_Dser1.0_protein.faa : Protein sequences. FNA File: GCF_002093755.1_Dser1.0_rna_from_genomic.fna : Coding sequences.

提供机构：

The University of Queensland

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集提供了果蝇（Drosophila serrata）基因组的支架构建和注释文件，基于长读长测序技术构建，总长度为198 Mbp，通过Hi-C方法将scaffold N50提升至30.3 Mb，其中六个最大支架覆盖了80%的基因组。数据集包含基因组序列、NCBI注释、蛋白质和编码序列等多种文件格式，适用于基因组学、生物信息学和进化生物学研究。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集