Sterechinus neumayeri v1 genome annotation
收藏Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/byjrp4xns7
下载链接
链接失效反馈官方服务:
资源简介:
Genome annotation was performed using the BRAKER3 pipeline (Gabriel et al., 2023). A repeat library was generated with RepeatModeler (Smit & Hubley, 2010). The repeat families were compared to known echinoderm protein coding gene models using BLAST and any repeat with a significant hit (e-value < 5e-5) were removed. The resulting repeat library was used to identify and mask repeats using RepeatMasker prior to annotation (Smit et al., 2010). BRAKER was run using protein models from S. purpuratus (available on echinobase https://download.xenbase.org/echinobase/Genomics/Spur5.0/sp5_0_GCF.gff3.gz ; last accessed 2/15/24), L.variegatus (available from echinobase https://download.xenbase.org/echinobase/Genomics/Lvar3.0/Lvar3_0_GCF_proteins.fa.gz ; last accessed 2/15/24), and L.pictus (available from echinobase https://download.xenbase.org/echinobase/Genomics/Lpic2.1/Lpic2_1_GCF_proteins.fa.gz ; last accessed 2/15/24) along with publicly available transcriptomic data on NCBI sequencing read archive and in house RNAseq dataset spanning embryogenesis. The gene models were annotated using the notation SNE_XXXXXX and analyzed for ‘completeness’ with BUSCO version 4 using the metazoan gene set (Simão et al., 2015). GRN gene curation was first carried out by curating a list of gene regulatory network genes (supplemental file X). Then, orthofinder2 (Emms & Kelly, 2019) was used to identify orthologs between S.neumayeri, L.variegatus, S.purupratus, and recently published genome annotation for P.lividus (Emms & Kelly, 2019; Marlétaz et al., 2023).
本研究采用BRAKER3流程(Gabriel等,2023)完成全基因组注释。利用RepeatModeler构建重复序列库(Smit与Hubley,2010);通过BLAST将所得重复序列家族与已知棘皮动物蛋白质编码基因模型进行比对,移除所有e值小于5e-5的显著比对结果对应的重复序列。注释前,使用RepeatMasker基于上述优化后的重复序列库识别并屏蔽基因组中的重复序列(Smit等,2010)。本次BRAKER分析整合了三类物种的蛋白质编码模型:紫球海胆(S. purpuratus,数据来自棘皮动物数据库echinobase,链接:https://download.xenbase.org/echinobase/Genomics/Spur5.0/sp5_0_GCF.gff3.gz,最后访问时间2024年2月15日)、刻肋海胆(L. variegatus,数据来自echinobase,链接:https://download.xenbase.org/echinobase/Genomics/Lvar3.0/Lvar3_0_GCF_proteins.fa.gz,最后访问时间2024年2月15日)以及彩绘海胆(L. pictus,数据来自echinobase,链接:https://download.xenbase.org/echinobase/Genomics/Lpic2.1/Lpic2_1_GCF_proteins.fa.gz,最后访问时间2024年2月15日),同时纳入NCBI序列读段档案(NCBI Sequencing Read Archive, SRA)中的公开转录组数据,以及本实验室覆盖胚胎发生全过程的RNA测序(RNAseq)数据集。基因模型采用SNE_XXXXXX的命名规范进行注释,并使用BUSCO v4版本结合后生动物基因集评估其注释完整性(Simão等,2015)。基因调控网络(Gene Regulatory Network, GRN)的注释工作首先通过整理一份目标基因调控网络基因列表完成(详见补充文件X)。随后利用OrthoFinder2(Emms与Kelly,2019)识别诺伊迈尔海胆(S. neumayeri)、刻肋海胆(L. variegatus)、紫球海胆(S. purupratus)以及近期发表的地中海紫海胆(P. lividus)基因组注释数据之间的同源基因(Emms与Kelly,2019;Marlétaz等,2023)。



