Novel Megaptera novaeangliae (Humpback whale) haplotype reference genome
收藏DataCite Commons2025-04-01 更新2025-04-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.dv41ns271
下载链接
链接失效反馈官方服务:
资源简介:
The sequencing of a kidney sample (KW2013002) from a stranded Megaptera
novaeangliae (Humpback whale) calf is the first chromosome level reference
genome for this species. The calf, a 457 cm and 2,500 lbs male, was found
stranded in Hawai’i Kai, HI, in 2013 and was marked as abandoned/orphaned.
In 2023, 1g of kidney was sequenced with PacBio
long-read DNA sequencing, chromatin conformation capture (Hi-C), RNA
sequencing, and mitochondrial sequencing to comprehensively characterize
the genome and transcriptome of M. novaeangliae. The reference genome was
compared to the preexisting M. novaeangliae scaffold to determine assembly
improvements. Data validation includes a synteny analysis, mitochondrial
annotation, and a comparison of BUSCO scores (scaffold v. reference genome
and Balaenoptera musculus (Blue whale) v. M. novaeangliae). BUSCO analysis
was performed on an M. novaeangliae scaffold-level assembly to determine
genomic completeness of the reference genome, with a scaffold BUSCO score
of 91.2% versus a score of 95.4% (Table I). Synteny analysis was performed
using the B. musculus genome as comparison to determine chromosome level
coverage and structure. Further, a time-based phylogenetic tree was
constructed using the sequenced data and publicly available genomes. This
dataset also contains the results of de novo repeat identification and
gene annotation for the Humpback whale (Megaptera novaeangliae) genome.
The repeat families were identified and classified using RepeatModeler,
and gene prediction was conducted using AUGUSTUS and SNAP, incorporating
coding sequences from related cetaceans. The resulting gene models were
further refined using the MAKER pipeline, with protein evidence from
Swiss-Prot and related species. tRNA genes were identified with
tRNAscan-SE. The dataset includes the transcript sequences
(GIU3625_Humpback_whale.transcript.fasta.gz), annotation file
(GIU3625_Humpback_whale.annotation.gff.gz), and a methods file
(methods.txt) detailing the bioinformatic processes.
提供机构:
Dryad
创建时间:
2024-08-19



