Supplementary data for: Chromosome-level genome assembly and circadian gene repertoire of the Patagonia blennie Eleginops maclovinus
收藏DataONE2023-05-30 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:47db74e4abd05a1c4bc13753d118bb47f7977571abdb9c432c523f7282e856e4
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the genome assembly and associated annotation of the Patagonian Blennie (Eleginops maclovinus), the closest extant taxon to the Antarctic notothenioid radiation. In addition to the characterization of the E. maclovinus genome, the dataset includes a description of circadian rhythm orthologs for E. maclovinus, other notothenenioid taxa, and teleost outgroups, as well as a copy of the bioinformatic scripts used for the assembly, annotation, and other downstream analysis., An E. maclovinus specimen was collected from the Puerto Natales, Chile in January 2018. HMW DNA was extracted and sequenced using PacBio Sequel II and a Hi-C library. A contig-level genome assembly was first generated using wtdgb2 (a.k.a. redbean) v2.5 (Ruan & Li 2020), and scaffolded with juicer v1.6.2 (Durand et al. 2016). PacBio and HiC raw data is available under NCBI BioProject PRJNA857989. For annotation, the RNA-seq data generated by Bilyk et al. (2018) was aligned to the genome, and processed using BRAKER v2.1.6 (Brůna et al. 2021). The generated annotation was then further processed using TSEBRA v1.0.1 (Gabriel et al. 2021). Using a custom Python script (see scripts section), we curated the TSEBRA output to guarantee consistency in the naming of genes and transcripts, as well as incorporating gene names and description based on the corresponding zebrafish orthologs.
A conserved synteny analysis using synolog (Catchen et al. 2009; Small et al. 2016) was employed for the manu..., All assembly and annotation files are gzipped, but are otherwise standard bioinformatic formats (i.e., FASTA for genome assembly and coding/amino acid sequences, GTF for annotation, AGP for scaffolding). In addition, bioinformatic scripts for data generation and analysis are in Python (*.py) or Bash (*.sh, but might require the installation of additional, open-source software (e.g., wtdbg2, BRAKER)
See links for a description of the FASTA (http://www.ncbi.nlm.nih.gov/blast/fasta.shtml), and GTF (https://useast.ensembl.org/info/website/upload/gff.html), and AGP (https://www.ncbi.nlm.nih.gov/assembly/agp/AGP_Specification/) file format specifications.
File format Specification
File Suffix1
Description
*.fa
Genome assembly in nucleotide FASTA format.Â
*.agp
Assembly structure in AGP format.
*.gtf
Genome annotation in GTF format.
*.cds.fa
Genomic sequence for all annotated protein-coding genes in nucleotide FASTA format.
*.protein.fa
Protein sequence for all annotated prote...
创建时间:
2025-07-17



