five

protein_coding_genes_fasta

收藏
DataCite Commons2024-09-11 更新2024-11-06 收录
下载链接:
https://figshare.com/articles/dataset/protein_coding_genes_fasta/26978401
下载链接
链接失效反馈
官方服务:
资源简介:
We applied a hybrid de novo assembly approach based on Illumina short-reads and Nanopore long-reads. Short– and long–reads were assembled to contigs using MaSuRCA v4.0.8. For gap-closing, assembled contigs were scaffolded into the draft genome using HaploMerger2 v20180603. The resultant draft haploid genomes had total lengths of 40.3–69.3 Mbp, scaffold numbers of 94–348, N50 of 0.35–1.09 Mbp, and the longest scaffold of length 2.1–4.6 Mbp, as calculated by QUAST v5.1.0rc1. We evaluated the gene completeness of our draft genome using BUSCO v5.3.0. BUSCO assessment showed that 90–98% (92.4 ± 2.3, average ± SD) of orthologs conserved in Stramenopiles were present in this genome assembly (sum of the percentages of single-copy and duplicate), suggesting that our draft genome possessed a sufficient gene repertoire from Stramenopiles. The organelle sequences were excluded from the assembly data, and repeat regions were masked to use the assembly data for the gene prediction. RNA–seqs were mapped to the assembled genome sequences using HISAT2 v2.2.1 with default settings, and gene prediction was made using Augustus v3.4.0 trained with the protein sequence data of <i>Thalassoisira pseudonana</i>, which is the closest species to <i>Skeletonema</i>, resulting in 15,275–21,376 protein–coding genes being annotated in the <i>Skeletonema</i> genomes.
提供机构:
figshare
创建时间:
2024-09-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作