five

Transcriptome sequencing of Daphnia galeata and combined assembly

收藏
NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://www.ncbi.nlm.nih.gov/sra/ERP016631
下载链接
链接失效反馈
官方服务:
资源简介:
A total RNA sample from D. galeata from a mixture of 24 clonal lines from four different lakes was sequenced using the Illumina MiSeq platform, producing a total of 40.6 million reads. These consisted of roughly 20.3 million PE reads of 250bp length. For the de novo transcriptome assembly, multiple assemblies with four different programs (Trinity, SOAPdenovo, Oases-Velvet, Trans-ABySS) and different k-mer sizes were combined. The de novo assemblers produced between 100,749 (Trinity) and 489,649 (SOAPdenovo) contigs, with a combined total of 1,218,949 (table 1). Applying CD-HIT-EST where necessary considerably reduced the redundancy of the data set; 583,357 contigs were merged together and further processed with the EviGene pipeline. The EviGene pipeline, used to merge different assemblies, classified 32,903 transcripts into the okay-main set and 47,849 transcripts into the alternative set. No particular assembler stood out as delivering very few or many transcripts, but there were differences among assemblers (table 1). Furthermore, Trinity was better in recovering the longest proteins: 532 of the 1000 longest proteins were obtained with this assembler. The number of obtained transcripts agrees well with the number of described genes in the related species D. pulex (30,810). In addition to the 32,903 transcripts, the tr2aacds script from the EviGene pipeline also produced a set of CDS and proteins.
创建时间:
2018-02-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作