five

A chromosome-scale reference genome assembly of the great sand eel, Hyperoplus lanceolatus

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.7pvmcvdxv
下载链接
链接失效反馈
官方服务:
资源简介:
Despite increasing sequencing efforts, numerous fish families still lack a reference genome, which complicates genetic research. One such understudied family is the sand lances (Ammodytidae, literally: ‘sand burrower’), a globally distributed clade of over 30 fish species that tend to avoid tidal currents by burrowing into the sand. Here, we present the first annotated chromosome-level genome assembly of the great sand eel (Hyperoplus lanceolatus). The genome assembly was generated using Oxford Nanopore Technologies long sequencing reads and Illumina short reads for polishing. The final assembly has a total length of 808.5 Mbp, of which 97.1% were anchored into 24 chromosome-scale scaffolds using proximity-ligation scaffolding. The assembly is highly contiguous with a scaffold and contig N50 of 33.7 Mbp and 31.3 Mbp, respectively, and has a BUSCO completeness score of 96.9%. The presented genome assembly is a valuable resource for future studies of sand lances, as they are of great ecological and commercial importance and may also contribute to studies aiming to resolve the suprafamiliar taxonomy of bony fishes. Methods Genome assembly We assembled the genome of Hyperoplus lanceolatus from Oxford Nanopore (ONT) reads using WTDBG2 v. 2.5 (Ruan & Li, 2019) using the preset for ONT reads (flag '-x ont') followed by three iterations of long-read polishing with racon v.1.4.3 (Vaser et al., 2017), one iteration of polishing with Medaka v.0.11.5 (Oxford Nanopore Technologies LTD., 2018) and three iterations of short-read polishing with pilon v.1.23 (Walker et al., 2014). The assembly was scaffolded into chromosome-scale scaffolds with the Dovetail Genomics´ HiRise pipeline (Putnam et al., 2016) using proximity-ligation data generated by the Dovetail Omni-C kit. Subsequently, gap-closing was performed using TGS-GapCloser v.1.1.1 (Xu et al., 2020), followed by the removal of haplotigs with purge_dups v.1.2.5 (Guan et al., 2020). The resulting final assembly, incl. the mitochondrial genome generated with MitoZ v.2.4 (Meng et al., 2019), can be found under the filename:  TBG_H_lanceolatus_asm_v1.1.fasta Transcriptome A transcriptome was assembled using the best practice guidelines described at https://informatics.fas.harvard.edu/best-practices-for-de-novo-transcriptome-assembly-with-trinity.html from Illumina RNAseq data generated from brain, heart, gill, muscle, liver, gonad, and pyloric gland tissue.  The transcriptome can be found under the file name: Hlan002_transcriptome_cleaned_for_ncbi_final.fasta Annotation Prior to gene annotation, we used RepeatModeler v. 2.0.1 (Flynn et al., 2020) for the generation of a de novo repeat library. This library was combined with an Actinopterygii-specific library from RepBase (Bao et al., 2015) and used as a custom repeat library for the masking of repeats with RepeatMasker v.4.1.0 (http://www.repeatmasker.org/RMDownload.html). First, we hard-masked all repeats in the assembly and in addition, we generated a masked assembly with hard-masked interspersed repeats andsoft-masked simple repeats. Both the masked assembly files, the de novo repeat library, and the related RepeatMasker output files can be found under the filenames: De Novo Repeat Library:  consensi.fa.classified all repeats hard-masked: TBG_H_lanceolatus_asm_final.purged_mtgenome.fa.masked TBG_H_lanceolatus_asm_final.purged_mtgenome.fa.tbl TBG_H_lanceolatus_asm_final.purged_mtgenome.fa.out Interspersed repeats hardmasked: TBG_H_lanceolatus_asm_final_hardmaskedTEs.fasta TBG_H_lanceolatus_asm_final_hardmaskedTEs.out TBG_H_lanceolatus_asm_final_hardmaskedTEs.tbl Interspersed repeats hardmasked (see above) and simple repeats soft-masked: TBG_H_lanceolatus_asm_final_hardmaskedTEs_softmaskedSR.fasta TBG_H_lanceolatus_asm_final_hardmaskedTEs_softmaskedSR.out TBG_H_lanceolatus_asm_final_hardmaskedTEs_softmaskedSR.tbl Homology-based gene prediction was performed with the GeMoMa pipeline v.1.7.1 with mapped RNAseq data as evidence and the following five references: Acanthochromis polyacanthus (GCA_002109545.1) Perca fluviatilis (GCA_010015445.1), Gasterosteus aculeatus (GCA_016920845.1), Betta splendens (GCA_900634795.3), Acanthopagrus latus (GCA_904848185.1) The predicted proteins were functionally annotated with InterProScan and BlastP against the Swiss-Prot database. Annotation files: H_lanceolatus_GeMoMa_all.fun.gff H_lanceolatus_GeMoMa_proteins.fun.fasta H_lanceolatus_GeMoMa_CDS.fun.fasta H_lanceolatus_GeMoMa_summary All commands used to generate the assembly, the annotation, and additional analyses are listed in the protocol file: H_lanceolatus_assembly_commands.txt
创建时间:
2023-01-13
二维码
社区交流群
二维码
科研交流群
商业服务