five

Assembly data files Oryzias dopingdopingensis

收藏
Mendeley Data2024-05-10 更新2024-06-30 收录
下载链接:
https://zenodo.org/records/8064517
下载链接
链接失效反馈
官方服务:
资源简介:
Identification and masking of repetitive elements in the genome sequence of O. dopingdopingensis was performed with the following bioinformatic tool case. Nucleotides were masked using the DUST algorithm with dustmasker (version 1.0.0, part of blast+ 2.9.0 (Altschul et al., 1990; Camacho et al., 2009) (Kuzio et al., unpublished but described in (Morgulis et al., 2006). Tandem Repeats were identified with Tandem Repeat Finder (trf version 4.09) (Benson, 1999). A species-specific de novo repeat library was built with RepeatModeler v1.0.11 (http://www.repeatmasker.org/RepeatModeler/). Repeat Elements were located in the genome sequence using RepeatMasker (version 4.1.0) (http://www.repeatmasker.org) with the de novo and Danio rerio libraries. The information from all four repeat analyses was merged and the genome was softmasked with bedtools (2.29.2) (Quinlan & Hall, 2010) PMID: 20110278; PMCID: PMC2832824.]. All steps of masking repetitive regions were performed with scripts provided by the sigenae platform, following the workflow from (Feron et al., 2020). For the identification of genes the masked genome was annotated with funannotate (Palmer & Stajich, 2019). The sequences were sorted by length with the ‘funannotate sort’ function, followed by a gene prediction with ‘funannotate predict’. No training based on RNA-Seq data was performed since it was not available for this species. Additional external evidence from transcripts and proteins was added. As transcript evidence, gene predictions from Oryzias latipes (NCBI Bioproject:PRJNA183868; Assembly: GCF_002234675.1) (Kasahara et al., 2007) and Oryzias melastigma (NCBI Bioproject: PRJNA401159 ; Assembly: ASM292280v2) (Kim et al., 2018) were used. As protein evidence, a protein set from Oryzias javanicus (NCBI Bioprject : PRJNA505405 ; Assembly: GCA_003999625.1) (Lee et al., 2020), manually annotated reference sequences from UniProt Knowledgebase (UniProtKB) (Release 2020_02 (22-Apr-2020) UniProtKB/Swiss-Prot with 562,253 entries ) (Apweiler et al., 2004) and a set of orthologous sequences generated in this study. Furthermore, the de novo gene predictors were trained with the Busco dataset of actinopterygii_odb10. Gene prediction resulted in a total of 56658 genes.
创建时间:
2023-07-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作