five

Genome Assembly and annotation of Acanthoscelides obtectus

收藏
NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/ERP136079
下载链接
链接失效反馈
官方服务:
资源简介:
For genome sequencing and assembly, we first subjected a line of A. obtectus to five consecutive generations of inbreeding by propagating a single female mated to a full sib brother. Males of this inbred line were subsequently used for sequencing. Samples of whole-body genomic high-molecular-weight DNA was extracted (10 males per sample) and submitted to long-read sequencing using PacBio. Extractions were made using QIAGEN Genomic-tip 20/G, according to the manufacturer's protocol. High molecular weight DNA was shared using the Megaruptor 2 system (Diagenode) with a 25 kb target. Size-selection with a 15 kb cut-off was done using the Blue Pippin system (SAGE). SMRTbell Template Prep Kit 1.0 was used for library construction according to manufacturer's instructions. Sequencing was performed using 21 SMRT cells on a Sequel I system, with 20 hr movies and V2 chemistry. Our sequencing effort yielded in total 8,655,274 reads with an average read-length of 10,176 bp (read-length N50: 16,250 bp) which corresponds to an average genomic coverage of approximately 80X. The genome was then assembled using FALCON v 0.5.0 (https://github.com/PacificBiosciences/FALCON/) with default parameters, based on the PacBio read data. The assembly was subsequently error-corrected by one round of Arrow (SMART portal) based on re-alignement of the full set of PacBio reads. The resulting polished genome assembly is 1.1 Gb in total size, contains 6,654 contigs with an N50 of 791 kb. The genome annotation service at the National Bioinformatics Infrastructure Sweden (www.nbis.se) carried out the genome annotation, using a comprehensive MAKER3 pipeline (Holt and Yandell, 2011). We created a species specific repeat library modeled using the RepeatModeler package (1.0.8) (Smit and Hubley 2010). A first round of annotation was performed with MAKER3 using both (1) curated protein sequences collected from the Uniprot Swiss-Prot database (Magrane 2011) and (2) the extensive amount of transcriptome data generated in the current study. This evidence-based gene build resulted in a first “release candidate” gene set (rc1) with 20,682 gene models. The evidence-based annotation is limited by the available sequence data, which can lead to fragmented gene models and missed genes. We next performed an ab initio evidence-driven gene build, we selected a high-confidence set of genes used to train the ab initio tools Augustus 2.7 (Stanke et al., 2006) and Snap 2006-07-28 (Korf, 2004). We also trained GeneMark-ET 4.3 (Lomsadze et al., 2014), which is a self-trained method integrating RNA-seq evidence using the junctions.bed file from Tophat. The ab initio evidence-driven annotation was performed with MAKER3, using both the output HMM-models from the trained ab initio tools (Augustus, Snap, and Genemark-ET) and the same evidence data as used previously. We also used EVidenceModeler (EVM) (Haas et al., 2008), which allowed us to perform gene models based on the best possible set of exons produced by the other ab initio tools, and choose the most consistent according to the available evidence. The ab initio evidence-driven gene build (rc2) contained 35,123 gene models. Finally, all ab initio gene models (rc2) that mapped within an empty locus in the evidence-driven annotation (rc1), was added to rc1 to create our final build (rc3), containing 38,104 gene models.
创建时间:
2023-01-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作