Updated genome annotation for Physarum polycephalum
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4086118
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains a genome assembly (previously released by another group) for the slime mold Physarum polycephalum, along with our updated annotation file (see details below). The updated annotation reflects a significant enrichment in U12-type introns in P. polycephalum as described in this preprint: https://doi.org/10.1101/2020.10.12.336362
We downloaded the P. polycephalum genome assembly and annotation from http://www.physarum-blast.ovgu.de/, and RNA-seq for P. polycephalum from NCBI’s SRA database (accession numbers DRR047256, ERR089824-ERR089827, and ERR557103-ERR557120). To reannotate the genome, we combined de novo and reference-based approaches. First, we generated a de novo transcriptome from the aggregate RNA-seq data using Trinity (Grabherr et al. 2011). We also separately mapped the reads to the genome using HISAT2 (Kim et al. 2019), allowing for non-canonical splice sites (--pen-noncansplice 0), followed by StringTie (M. Pertea et al. 2016) to incorporate the mapped reads with the existing annotations and generate additional putative transcript structures. Coding-sequence annotations for the assembled transcripts, informed by additional homology information from the SwissProt (UniProt Consortium 2008) protein database, were generated using TransDecoder (Brian J. Haas et al. 2013), and further refined with the de novo transcriptome via PASA (B. J. Haas 2003). In addition, an AUGUSTUS (Stanke et al. 2008) annotation was generated from the mapped reads using BRAKER1 (Hoff et al. 2015) explicitly allowing for AT-AC splice boundaries (--allow_hinted_splicesites=atac). Lastly, the AUGUSTUS- and StringTie-based gene predictions were merged using gffcompare (G. Pertea and Pertea 2020), and updated again using PASA. To gauge the quality of our annotations versus those previously available, we performed a BUSCO (Simão et al. 2015) analysis against conserved eukaryotic genes; the previous annotations contained matches to 60.1% of eukaryotic BUSCO groups (54.5% single-copy; 27.1% fragmented; 12.8% missing); our annotation increased this percentage to 73.3% (64.4% single-copy; 18.5% fragmented; 8.2% missing).
创建时间:
2020-10-14



