five

Genome Annotation and Sequence Files for Ostrinia furnacalis

收藏
Figshare2024-12-03 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Genome_Annotation_and_Sequence_Files_for_i_Ostrinia_furnacalis_i_/27948936
下载链接
链接失效反馈
官方服务:
资源简介:
Protein-coding genes in the O. furnacalis genome were predicted by integrating homology-based gene, transcript-based, and ab initio approaches. Homology-based predictions were performed by aligning protein sequences from nine reference species (Anopheles gambiae, Chilo suppressalis, Danaus plexippus, Drosophila melanogaster, Helicoverpa armigera, O. furnacalis, Plutella xylostella, Bombyx mori, and Spodoptera frugiperda) to the assembled genome using Exonerate (v2.4.7). Transcript-based predictions utilized RNA-seq data, which were assembled into transcripts using Trinity (v2.11.0) 29, and genes models were predicted from these transcripts using PASA (v2.3.1) 30. Additionally, the RNA-seq data were aligned to the genome assembly using Hisat2 (v2.2.1) 31, and the alignment results were further processed using StringTIE (v2.1.4) 32 to assemble the transcript. These transcripts were then analyzed by TransDecoder (v5.5.0) (https://github.com/TransDecoder/TransDecoder/wiki) to identify protein-coding genes. For ab initio predictions, Augustus (v3.3.3) 33 and GeneMark (v4.61) 34 were employed, incorporating transcript-based predictions as hints. Gene predictions from all three approaches were integrated using EvidenceModeler (v1.1.1) 30, resulting in 16,272 predicted genes with an average of 6.44 exons per gene and an average exon length of 220.96 bp. BUSCO (v4) (BUSCO, RRID: SCR_015008) 35 analysis revealed 1,334 (97.6%) were identified as complete or partial BUSCO profiles, reflecting a high level of gene prediction accuracy. Among these, 1,296 were classified as single-copy genes, while 30 were identified as duplicated copies (Table S5). Functional annotation of the predicted protein-coding genes was performed using InterProScan (v5.55) 36 and DIAMOND (v2.0.14.152) 37, with protein sequences aligned to the UniProt-TrEMBL database using a threshold parameter of ‘-e 1e-10’. A total of 14,116 genes (86.8%) were successfully annotated, identifying major functional domains across multiple databases. Key annotations included Pfam (11,224 genes, 69.0%), PANTHER (11,527 genes, 70.8%), Gene3D (9,683 genes, 59.5%), CDD (4,578 genes, 28.1%), SMART (5,167 genes, 31.8%), and SUPERFAMILY (9,099 genes, 55.9%). Additional contributions came from MobiDBLite (6,513 genes, 40.0%), PROSITE Patterns (3,280 genes, 20.2%), and PROSITE Profiles (5,902 genes, 36.3%) (Table S6). These results highlight the comprehensive functional insights provided by integrating multiple annotation resources.
创建时间:
2024-12-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作