protein_coding_genes_fasta

Name: protein_coding_genes_fasta
Creator: figshare
Published: 2024-09-11 02:56:10
License: 暂无描述

DataCite Commons2024-09-11 更新2024-11-06 收录

下载链接：

https://figshare.com/articles/dataset/protein_coding_genes_fasta/26978401

下载链接

链接失效反馈

官方服务：

资源简介：

We applied a hybrid de novo assembly approach based on Illumina short-reads and Nanopore long-reads. Short– and long–reads were assembled to contigs using MaSuRCA v4.0.8. For gap-closing, assembled contigs were scaffolded into the draft genome using HaploMerger2 v20180603. The resultant draft haploid genomes had total lengths of 40.3–69.3 Mbp, scaffold numbers of 94–348, N50 of 0.35–1.09 Mbp, and the longest scaffold of length 2.1–4.6 Mbp, as calculated by QUAST v5.1.0rc1. We evaluated the gene completeness of our draft genome using BUSCO v5.3.0. BUSCO assessment showed that 90–98% (92.4 ± 2.3, average ± SD) of orthologs conserved in Stramenopiles were present in this genome assembly (sum of the percentages of single-copy and duplicate), suggesting that our draft genome possessed a sufficient gene repertoire from Stramenopiles. The organelle sequences were excluded from the assembly data, and repeat regions were masked to use the assembly data for the gene prediction. RNA–seqs were mapped to the assembled genome sequences using HISAT2 v2.2.1 with default settings, and gene prediction was made using Augustus v3.4.0 trained with the protein sequence data of Thalassoisira pseudonana, which is the closest species to Skeletonema, resulting in 15,275–21,376 protein–coding genes being annotated in the Skeletonema genomes.

提供机构：

figshare

创建时间：

2024-09-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集