five

NCMD assembly and gene annotation

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/NCMD_assembly/23708352
下载链接
链接失效反馈
官方服务:
资源简介:
For genome assembly, we generated short and long reads using the DNA sample of an adult male of Nanchukmacdon. Then, we constructed a genome assembly using the sequencing reads with the reference-guided approach. The 80.14x raw PacBio subreads were assembled and polished to generate 1,942 high-quality contigs supported by at least 50 PacBio subreads. To generate a chromosome-level assembly, the high-quality polished contigs were then further assembled by an improved version of RACA that can utilize both the genome information of related species and diverse types of sequencing data. The assembly was used to build the final assembly after one more polishing step using short reads.    For annotating protein-coding genes, RNA samples were prepared and sequenced from 24 different tissues of the Nanchukmacdon individual which was used for whole genome sequencing. Using a combination of ab initio and homology-based prediction approaches with the RNA sequencing data, a total of 20,588 protein-coding genes with an average length of 47.06 Kbp were annotated in the NCMD assembly.  Non-coding genes for diverse types of RNAs, including rRNA, snRNA, and miRNA, were annotated by using the Rfam database and Infernal (v.1.1.3). The tRNAscan-SE (v.2.0.5) and RNAmmer (v.1.2) were used to annotate non-coding genes for tRNA and rRNA, respectively.   The sequencing read data for genome assembly and annotation can be obtained at NCBI SRA under the project of PRJNA967127.
创建时间:
2023-07-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作