Data associated with the reannotation of repeats in the Octopus vulgaris genome assembly (ASM119413v2)
收藏DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20056424
下载链接
链接失效反馈官方服务:
资源简介:
This is the repeat annotation data generated in "Biological implications of a detailed repeat annotation in Octopus vulgaris" (https://doi.org/10.64898/2026.03.03.709284) for the Octopus vulgaris ASM119413v2 assembly (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_001194135.2/). This includes:
-the GFF file of the repeat annotation (OctVulg_genome_annotation_only.filteredRepeats.gff)
-this is the raw annotation before TE sequences smaller than 100 bp were filtered out for calculating summary information
-the FASTA file of reference TE sequences used as well as the new, curated TE and other repeat consensus sequences that were generated (O_vulgaris_and_reference_repeats_Oct24-2.fasta)
-sequence headers for reference sequences start with 'REFERENCE', and with 'Ovulg' for new consensus sequences. This includes several consensus sequences of Zinc-finger gene arrays that were noticed during curation (#ZF-array) as well as satellites, unknown repeats and some RNA loci. Characters before the '#' symbol are strings which are unique to each sequence
The R markdown file (GBE_O_vulgaris_repeat_annotation.Rmd) contains the code for filtering the GFF file for short TE hits, for young elements and for recreating Figures 1 and 2, including the hotspot/coldspot analysis. Input required for this pipeline is the GFF file of the repeat annotation (OctVulg_genome_annotation_only.filteredRepeats.gff) and the Octopus vulgaris ASM119413v2 assembly (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_001194135.2/)
提供机构:
Zenodo
创建时间:
2026-05-06



