five

Supporting data for "a draft genome assembly of the sea slug Elysia chlorotica"

收藏
DataCite Commons2025-06-01 更新2024-07-27 收录
下载链接:
https://figshare.com/articles/Supporting_data_for_a_draft_genome_assembly_of_the_sea_slug_Elysia_chlorotica_/7057916/2
下载链接
链接失效反馈
官方服务:
资源简介:
Elysia chlorotica, a sacoglossan sea slug found off the East Coast of the United States, is well-known for its ability to sequester chloroplasts from its algal prey and survive by photosynthesis for up to 12 months in the absence of food supply. Here we present a draft genome assembly of E. chlorotica that was generated using a hybrid assembly strategy with Illumina short reads and PacBio long reads. The genome assembly comprised 9,989 scaffolds, with a total length of 557 Mb and a scaffold N50 of 442 kb. BUSCO assessment indicated that 93.3 % of the expected metazoan genes were completely present in the genome assembly. Annotation of the E. chlorotica genome identified 176 Mb (32.6 %) of repetitive sequences and a total of 24,980 protein-coding genes. We anticipate that the annotated draft genome assembly of the E. chlorotica sea slug will promote the investigation of sacoglossan genetics, evolution, and particularly, the genetic signatures accounting for the long-term functioning of algal chloroplasts in an animal.<br><b>Genome assembly and annotation files provided in this dataset:</b>1. <i>Elysia_chlorotica.fa.gz</i>: genome assembly of Elysia chlorotica in fasta format.2. <i>Elysia_chlorotica.gene.gff.gz</i>: protein-coding gene annotation in GFF3 format.3. <i>Elysia_chlorotica.gene.cds.gz</i>: coding sequences of the protein-coding genes in fasta format.4. <i>Elysia_chlorotica.gene.pep.gz</i>: peptide sequences of the protein-coding genes in fasta format.<br>5. <i>Elysia_chlorotica.ProteinMask.gff.gz</i>: homology-based repetitive elements identified by searching against TE protein database with RepeatProteinMask in GFF3 format.<br>6. <i>Elysia_chlorotica.RepeatMasker.gff.gz</i>: homology-based repetitive elements identified by searching against Repbase with RepeatMasker in GFF3 format.<br>7. <i>Elysia_chlorotica.RepeatModeler.gff.gz</i>: denovo-based repetitive elements identified by RepeatModeler followed by RepeatMasker in GFF3 format.<br>8. <i>Elysia_chlorotica.TRF.gff.gz</i>: tandem repeats identified by Tandem Repeats Finder in GFF3 format.<br>

绿叶海天牛(Elysia chlorotica)是一种分布于美国东海岸海域的囊舌类海蛞蝓,其因可从摄食的藻类猎物中捕获叶绿体,并在断食条件下依靠光合作用存活长达12个月而广受关注。本研究报道了该物种的草图基因组组装结果,该组装采用Illumina短读长测序与PacBio长读长测序相结合的混合组装策略完成。本次组装共获得9989个支架序列(scaffold),总长度达557 Mb,支架序列N50为442 kb。BUSCO评估结果显示,该基因组组装完整覆盖了93.3%的后生动物保守基因集。对E. chlorotica基因组的注释分析显示,其重复序列总长176 Mb,占基因组总长度的32.6%,共注释得到24980个蛋白质编码基因。我们预期,本次注释完成的绿叶海天牛草图基因组组装将推动囊舌类动物遗传学、进化生物学的研究,尤其是阐释动物体内藻类叶绿体长期功能维持的遗传特征。 **本数据集提供如下基因组组装与注释文件:** 1. *Elysia_chlorotica.fa.gz*:采用FASTA格式存储的Elysia chlorotica基因组组装序列 2. *Elysia_chlorotica.gene.gff.gz*:采用GFF3格式存储的蛋白质编码基因注释信息 3. *Elysia_chlorotica.gene.cds.gz*:采用FASTA格式存储的蛋白质编码基因编码序列 4. *Elysia_chlorotica.gene.pep.gz*:采用FASTA格式存储的蛋白质编码基因肽段序列 5. *Elysia_chlorotica.ProteinMask.gff.gz*:采用GFF3格式存储的、通过RepeatProteinMask比对转座元件(Transposable Element,TE)蛋白数据库注释得到的同源性重复元件信息 6. *Elysia_chlorotica.RepeatMasker.gff.gz*:采用GFF3格式存储的、通过RepeatMasker比对Repbase数据库注释得到的同源性重复元件信息 7. *Elysia_chlorotica.RepeatModeler.gff.gz*:采用GFF3格式存储的、经RepeatModeler从头预测后通过RepeatMasker验证的重复元件信息 8. *Elysia_chlorotica.TRF.gff.gz*:采用GFF3格式存储的、通过Tandem Repeats Finder注释得到的串联重复序列信息
提供机构:
figshare
创建时间:
2019-01-07
二维码
社区交流群
二维码
科研交流群
商业服务