five

Transcriptome and genome assemblies of <i>Blasia pusilla</i>.|转录组学数据集|基因组学数据集

收藏
DataCite Commons2025-06-01 更新2024-08-19 收录
转录组学
基因组学
下载链接:
https://figshare.com/articles/dataset/Transcriptome_and_genome_assemblies_of_i_Blasia_pusilla_i_/26048638/1
下载链接
链接失效反馈
资源简介:
Transcriptome data set:We combined all RNA-seq data (PRJNA1099914: SRA submission SUB14374705, SUB14439959,SUB14439962) and the draft genome sequence to create reference transcripts for B. pusilla which was used later for gene expression estimation. We generated a genome-guided as well as a de novo TRINITY transcriptome assembly using all collected RNA-seq reads (42 RNA libraries). We used the PASA pipeline to combine the de novo and genome-guided assemblies into a non-redundant set of transcripts and putative genes. We ran all transcripts through Transdecoder (https://github.com/TransDecoder/TransDecoder) to obtain their best ORF and peptide translation. To reduce the number of potential transcripts and gene models, we discarded all putative gene ids without at least one complete ORF prediction. To identify potential contaminants in this filtered transcript file, we selected the longest ORF for each putative gene and searched them against the Eggnog database in TRAPID 2.0 and assessed their taxonomic assignment. The initial run indicated that most of the contamination concerned yeast. To remove these transcripts, we searched all transcripts against the full transcriptome of Saccharomyces cerevisisae S288C (assembly R64) transcripts using blastn version 2.12.0+ and discarded all transcripts passing the following filtering criterium: -evalue ≤ 0.0001, similarity value ≥ 90%, and query coverage ≥ 80%. (Yeast transcriptomics data was downloaded from here:   https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_rna.fna.gz) After that, we ran a further taxonomic assignment using TRAPID 2.0 and removed another smaller set of genes mapping to Viruses, Bacteria, Archeae, Opisthokonta, and other Eukaryota. The final assembly had 22635 genes represented by 156239 transcripts. <br>Genome: In-vitro grown gametophytic cultures of B. pusilla on MS medium were used for genomic DNA extraction using the plant-EZ DNA extraction kit and were sequenced using PacBio-SMRT at the Institute of Biotechnology, University of Helsinki, Finland. Reads mapped using BLASR to organelle and cyanobacterial genomes were removed. In addition to the PacBio sequencing, short read sequencing was done to support hybrid genome assembly and polishing. Short reads were trimmed using Fastp and assembled together with the filtered PacBio reads using the hybrid genome assembler Masurca. The hybrid genome assembly was polished with cleaned short reads using Pilon with a minimum read depth of 10. Finally, we used blobtools v.1.1.1, the NCBI nr database and the average coverage Illumina reads) of each scaffold to remove scaffolds of contaminant origin. After visual assessment we kept scaffolds with a taxonomic assignment of viridiplantae or streptophyte. Scaffolds with other taxonomic assignments were discarded.
提供机构:
figshare
创建时间:
2024-06-19
用户留言
有没有相关的论文或文献参考?
这个数据集是基于什么背景创建的?
数据集的作者是谁?
能帮我联系到这个数据集的作者吗?
这个数据集如何下载?
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作