Predicted proteome of Paratrimastix pyriformis
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/6405157
下载链接
链接失效反馈官方服务:
资源简介:
The upload contains a predicted proteome from a genomic assembly of flagellate Paratrimastix pyriformis (Metamonada, Excavata). The publication describing the genomic study in detail is in progress. The final P. pyriformis genome was assembled into 650 scaffolds spanning 56,722,987 bp, with an N50 = 268,802 bp and a GC content of 60.92%. Manual and automatic gene prediction resulted in 13,532 predicted protein-coding genes, which are the subject of this upload. Below we briefly describe, how the data were generated.
DNA isolation: Monoeukaryotic, xenic culture of P. pyriformis (strain RCP-MX, ATCC 50935) was maintained in the Sonneborn's Paramecium medium ATCC 802 at room temperature. The DNA was isolated from 15 litres of culture using two different kits. The gDNA samples for PacBio, Illumina HiSeq, and Illumina MiSeq sequencing were each isolated using the Qiagen DNeasy Blood & Tissue Kit (Qiagen). The isolated gDNA was further ethanol-precipitated to increase the concentration and remove any contaminants. For nanopore sequencing, the DNA was isolated using Qiagen MagAttract HMW DNA Kit (Qiagen) according to the manufacturer’s protocol.
Sequencing: We used three platforms to generate the sequence data - PacBio (RSII sequencer), Illumina (HiSeq and MiSeq) and Oxford Nanopore (two flow cells, MinION Mk1B).
Assembling: Sequencing quality was assessed with FastQC (Andrew 2010). For the Illumina data, adapter and quality trimming was performed using Trimmomatic 0.36 (Bolger et al. 2014), with a quality threshold of 15. For the nanopore data, trimming and removal of chimeric reads was performed using Porechop v0.2.3 (https://github.com/rrwick/Porechop). The initial assembly of the genomes was made only with the Nanopore and PacBio generated reads using Canu v1.7.1 assembler (Koren et al. 2017), with the corMinCoverage and corOutCoverage set to 0 and 100000 respectively. After assembly, the data were binned using tetraESOM (Haddad et al. 2009). The resulting eukaryotic bins were also checked using a combination of BLASTn and BLASTp and a scoring strategy based on the identity and coverage of the scaffold as described in (Treitli et al. 2019). After binning, the resulted genomic bins were polished in two phases. In the first phase, the scaffolds were polished using the raw reads generated by nanopore with Nanopolish (Loman et al. 2015). In the second phase, the resulting scaffolds generated by Nanopolish were further corrected using Illumina short reads with Pilon v1.21 (Walker et al. 2014). Finally, the genome assembly of P. pyriformis was further scaffolded with raw RNA-seq reads using Rascaf (Song et al. 2016).
Gene prediction: For de novo prediction of genes, first, we manually re-trained Augustus using a manually curated set of gene models. After re-training of Augustus, intron hints were generated from the RNAseq data and gene prediction was performed on repeat masked genomes using Augustus 3.2.3 (Stanke and Waack 2003). For polishing of the predicted genes, we mapped the transcriptome assemblies to the genome using PASA (Haas et al. 2003) and used the assembled transcripts by PASA as evidence for gene model polishing with EVM (Haas et al. 2008).
Protein annotation: Automatic annotation of the proteins was performed using KEGG Automatic Annotation Server (Moriya et al. 2007), as well as similarity searches using BLAST against NCBI nr protein database. Manual search and annotation were performed by searching the predicted proteome using BLAST and HMMER (Finn et al. 2011). Proteins of interest were manually investigated and if possible, the gene models were manually corrected.
创建时间:
2022-04-01



