five

Novel Megaptera novaeangliae (Humpback whale) haplotype reference genome

收藏
DataCite Commons2025-04-01 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.dv41ns271
下载链接
链接失效反馈
官方服务:
资源简介:
The sequencing of a kidney sample (KW2013002) from a stranded Megaptera novaeangliae (Humpback whale) calf is the first chromosome level reference genome for this species. The calf, a 457 cm and 2,500 lbs male, was found stranded in Hawai’i Kai, HI, in 2013 and was marked as abandoned/orphaned. In 2023, 1g of  kidney was sequenced  with PacBio long-read DNA sequencing, chromatin conformation capture (Hi-C), RNA sequencing, and mitochondrial sequencing to comprehensively characterize the genome and transcriptome of M. novaeangliae. The reference genome was compared to the preexisting M. novaeangliae scaffold to determine assembly improvements. Data validation includes a synteny analysis, mitochondrial annotation, and a comparison of BUSCO scores (scaffold v. reference genome and Balaenoptera musculus (Blue whale) v. M. novaeangliae). BUSCO analysis was performed on an M. novaeangliae scaffold-level assembly to determine genomic completeness of the reference genome, with a scaffold BUSCO score of 91.2% versus a score of 95.4% (Table I). Synteny analysis was performed using the B. musculus genome as comparison to determine chromosome level coverage and structure. Further, a time-based phylogenetic tree was constructed using the sequenced data and publicly available genomes. This dataset also contains the results of de novo repeat identification and gene annotation for the Humpback whale (Megaptera novaeangliae) genome. The repeat families were identified and classified using RepeatModeler, and gene prediction was conducted using AUGUSTUS and SNAP, incorporating coding sequences from related cetaceans. The resulting gene models were further refined using the MAKER pipeline, with protein evidence from Swiss-Prot and related species. tRNA genes were identified with tRNAscan-SE. The dataset includes the transcript sequences (GIU3625_Humpback_whale.transcript.fasta.gz), annotation file (GIU3625_Humpback_whale.annotation.gff.gz), and a methods file (methods.txt) detailing the bioinformatic processes.
提供机构:
Dryad
创建时间:
2024-08-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作