five

Drosophila serrata genome scaffolding and annotation

收藏
DataCite Commons2026-01-21 更新2025-04-16 收录
下载链接:
https://espace.library.uq.edu.au/view/UQ:ce96855
下载链接
链接失效反馈
官方服务:
资源简介:
Supplementary files required for https://github.com/scottlallen/DserSweepsThe reference genome of D. serrata was created using long-read sequencing technology and has a length of 198 Mbp and a contig N50 of 0.94 Mbp (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_002093755.2/). We subsequently used Dovetail HiRise and Hi-C methods to scaffold those contigs and achieved a scaffold N50 of 30.3 Mb. The six largest scaffolds span 80% of the genome and reach near chromosome-arm level length except for 2L, which is spanned by two large scaffolds of 21Mb and 8.7 Mb. The genome was annotated by NCBI and lifted over to the Hi-C genome.  File Descriptions:FASTA Files: drosophila_06Jul2018_A8VGg.fasta : Original Hi-C genome sequence. - drosophila_06Jul2018_A8VGg_noSpecialChar.fasta : Hi-C genome sequence with special characters removed from scaffold names.drosophila_06Jul2018_A8VGg_noSpecialChar_MASKED.fasta : Masked version of the Hi-C genome sequence. drosophila_06Jul2018_A8VGg_noSpecialChar_MASKED_shortName.fasta : Masked version of the Hi-C genome sequence with short scaffold names. - `top6.anc.fa : Hi-C genome sequence of the 6 longest scaffolds specifying the ancestral sequence.  GFF Files: GCF_002093755.1_Dser1.0_genomic_OGcontigs_NOregion_HiC_liftOver_sorted.gff : NCBI Annotation file converted to Hi-C scaffolds. GCF_002093755.1_Dser1.0_genomic_OGcontigs_NOregion_HiC_liftOver_sorted_noSpecialChar.gff : Annotation file converted to Hi-C scaffolds with special characters removed from scaffold names.  FAA File: GCF_002093755.1_Dser1.0_protein.faa : Protein sequences.  FNA File: GCF_002093755.1_Dser1.0_rna_from_genomic.fna : Coding sequences.
提供机构:
The University of Queensland
创建时间:
2024-07-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作