five

Updated genome annotation for Physarum polycephalum

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4086118
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains a genome assembly (previously released by another group) for the slime mold Physarum polycephalum, along with our updated annotation file (see details below). The updated annotation reflects a significant enrichment in U12-type introns in P. polycephalum as described in this preprint: https://doi.org/10.1101/2020.10.12.336362 We downloaded the P. polycephalum genome assembly and annotation from http://www.physarum-blast.ovgu.de/, and RNA-seq for P. polycephalum from NCBI’s SRA database (accession numbers DRR047256, ERR089824-ERR089827, and ERR557103-ERR557120). To reannotate the genome, we combined de novo and reference-based approaches. First, we generated a de novo transcriptome from the aggregate RNA-seq data using Trinity (Grabherr et al. 2011). We also separately mapped the reads to the genome using HISAT2 (Kim et al. 2019), allowing for non-canonical splice sites (--pen-noncansplice 0), followed by StringTie (M. Pertea et al. 2016) to incorporate the mapped reads with the existing annotations and generate additional putative transcript structures. Coding-sequence annotations for the assembled transcripts, informed by additional homology information from the SwissProt (UniProt Consortium 2008) protein database, were generated using TransDecoder (Brian J. Haas et al. 2013), and further refined with the de novo transcriptome via PASA (B. J. Haas 2003). In addition, an AUGUSTUS (Stanke et al. 2008) annotation was generated from the mapped reads using BRAKER1 (Hoff et al. 2015) explicitly allowing for AT-AC splice boundaries (--allow_hinted_splicesites=atac). Lastly, the AUGUSTUS- and StringTie-based gene predictions were merged using gffcompare (G. Pertea and Pertea 2020), and updated again using PASA. To gauge the quality of our annotations versus those previously available, we performed a BUSCO (Simão et al. 2015) analysis against conserved eukaryotic genes; the previous annotations contained matches to 60.1% of eukaryotic BUSCO groups (54.5% single-copy; 27.1% fragmented; 12.8% missing); our annotation increased this percentage to 73.3% (64.4% single-copy; 18.5% fragmented; 8.2% missing).
创建时间:
2020-10-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作