five

Data associated with the reannotation of repeats in the Octopus vulgaris genome assembly (ASM119413v2)

收藏
DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20056424
下载链接
链接失效反馈
官方服务:
资源简介:
This is the repeat annotation data generated in  "Biological implications of a detailed repeat annotation in Octopus vulgaris" (https://doi.org/10.64898/2026.03.03.709284) for the Octopus vulgaris ASM119413v2 assembly  (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_001194135.2/). This includes: -the GFF file of the repeat annotation (OctVulg_genome_annotation_only.filteredRepeats.gff) -this is the raw annotation before TE sequences smaller than 100 bp were filtered out for calculating summary information  -the FASTA file of reference TE sequences used as well as the new, curated TE and other repeat consensus sequences that were generated (O_vulgaris_and_reference_repeats_Oct24-2.fasta) -sequence headers for reference sequences start with 'REFERENCE', and with 'Ovulg' for new consensus sequences. This includes several consensus sequences of Zinc-finger gene arrays that were noticed during curation (#ZF-array) as well as satellites, unknown repeats and some RNA loci. Characters before the '#' symbol are strings which are unique to each sequence The R markdown file (GBE_O_vulgaris_repeat_annotation.Rmd) contains the code for filtering the GFF file for short TE hits, for young elements and for recreating Figures 1 and 2, including the hotspot/coldspot analysis. Input required for this pipeline is the GFF file of the repeat annotation (OctVulg_genome_annotation_only.filteredRepeats.gff) and the Octopus vulgaris ASM119413v2 assembly  (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_001194135.2/)
提供机构:
Zenodo
创建时间:
2026-05-06
二维码
社区交流群
二维码
科研交流群
商业服务