KMP assembly and gene annotation
收藏DataCite Commons2025-06-01 更新2024-08-19 收录
下载链接:
https://figshare.com/articles/dataset/KMP_assembly_and_gene_annotation/25624221/3
下载链接
链接失效反馈官方服务:
资源简介:
The total DNA of a blood sample of a male Korean minipig (27 months old) was extracted, and DNA libraries for short, long, and Hi-C reads were constructed and sequenced. Contigs were generated by connecting PacBio subreads (79.78x) using Canu (v1.9), and the high-quality contigs supported by at least 50 subreads were selected for the subsequent procedure. The remaining contigs were polished using GenomicConsensus (v2.3.3). To build a chromosome-level genome assembly, the contigs were scaffolded using short reads, long reads, and multiple reference genomes (Bama, Göttingen, Meishan, Duroc, Landrace, Large white, cow, and goat) by RACA. Subsequently, Hi-C reads were aligned to the scaffolds using the Arima Hi-C mapping pipeline and SALSA2 was run for the Hi-C scaffolding. Lastly, the correction of misassemblies and the gap closing were done with short read data twice using Pilon (v1.22).For gene annotation, RNAs from 26 different tissues (appendix, backfat, bone marrow, brain, colon, forelimb muscle, groin, heart, hindlimb muscle, intestine, kidney, liver, lung, lymph node, nipple, pancreas, phren, pituitary gland, rib, sirloin, spinal cord, spleen, stomach, tenderloin, testis, and thymus) were extracted and sequenced on the Illumina platform. To annotate protein-coding genes, RNA-seq data were mapped to the chromosome-level scaffolds in the KMP assembly using HISAT2 (v2.2.1). Using the RNA-seq and gene annotation data of different species (pig, human, cow, and goat), we predicted the protein-coding genes in the KMP assembly by running GeMoMa (v1.9). For annotating non-coding genes, various types of non-coding RNAs, including rRNA, snRNA, and miRNA, were annotated using the Rfam database and Infernal (v1.1.3). Additionally, tRNA and rRNA were predicted with tRNAscan-SE (v2.0.5) and RNAmmer (v1.2), respectively. The final annotation was generated by merging all predictions using the Perl script (https://github.com/jkimlab/NCMD_study) provided by the previous study.<br>The sequencing read data for genome assembly and annotation can be obtained at NCBI SRA under the project of the PRJNA1104148.
提供机构:
figshare
创建时间:
2024-07-22



