NCMD assembly and gene annotation

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://figshare.com/articles/dataset/NCMD_assembly/23708352

下载链接

链接失效反馈

官方服务：

资源简介：

For genome assembly, we generated short and long reads using the DNA sample of an adult male of Nanchukmacdon. Then, we constructed a genome assembly using the sequencing reads with the reference-guided approach. The 80.14x raw PacBio subreads were assembled and polished to generate 1,942 high-quality contigs supported by at least 50 PacBio subreads. To generate a chromosome-level assembly, the high-quality polished contigs were then further assembled by an improved version of RACA that can utilize both the genome information of related species and diverse types of sequencing data. The assembly was used to build the final assembly after one more polishing step using short reads. For annotating protein-coding genes, RNA samples were prepared and sequenced from 24 different tissues of the Nanchukmacdon individual which was used for whole genome sequencing. Using a combination of ab initio and homology-based prediction approaches with the RNA sequencing data, a total of 20,588 protein-coding genes with an average length of 47.06 Kbp were annotated in the NCMD assembly. Non-coding genes for diverse types of RNAs, including rRNA, snRNA, and miRNA, were annotated by using the Rfam database and Infernal (v.1.1.3). The tRNAscan-SE (v.2.0.5) and RNAmmer (v.1.2) were used to annotate non-coding genes for tRNA and rRNA, respectively. The sequencing read data for genome assembly and annotation can be obtained at NCBI SRA under the project of PRJNA967127.

创建时间：

2023-07-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集