protein_coding_genes_fasta
收藏DataCite Commons2024-09-11 更新2024-11-06 收录
下载链接:
https://figshare.com/articles/dataset/protein_coding_genes_fasta/26978401
下载链接
链接失效反馈官方服务:
资源简介:
We applied a hybrid de novo assembly approach based on Illumina short-reads and Nanopore long-reads. Short– and long–reads were assembled to contigs using MaSuRCA v4.0.8. For gap-closing, assembled contigs were scaffolded into the draft genome using HaploMerger2 v20180603. The resultant draft haploid genomes had total lengths of 40.3–69.3 Mbp, scaffold numbers of 94–348, N50 of 0.35–1.09 Mbp, and the longest scaffold of length 2.1–4.6 Mbp, as calculated by QUAST v5.1.0rc1. We evaluated the gene completeness of our draft genome using BUSCO v5.3.0. BUSCO assessment showed that 90–98% (92.4 ± 2.3, average ± SD) of orthologs conserved in Stramenopiles were present in this genome assembly (sum of the percentages of single-copy and duplicate), suggesting that our draft genome possessed a sufficient gene repertoire from Stramenopiles. The organelle sequences were excluded from the assembly data, and repeat regions were masked to use the assembly data for the gene prediction. RNA–seqs were mapped to the assembled genome sequences using HISAT2 v2.2.1 with default settings, and gene prediction was made using Augustus v3.4.0 trained with the protein sequence data of <i>Thalassoisira pseudonana</i>, which is the closest species to <i>Skeletonema</i>, resulting in 15,275–21,376 protein–coding genes being annotated in the <i>Skeletonema</i> genomes.
提供机构:
figshare
创建时间:
2024-09-11



