five

Updated short variant, gene and transposable element predictions for P. nodorum isolates.

收藏
DataCite Commons2021-11-02 更新2024-08-17 收录
下载链接:
https://figshare.com/articles/dataset/Updated_gene_and_transposable_element_predictions_for_P_nodorum_isolates_SN15_SN4_SN79_and_SN2000/13340975
下载链接
链接失效反馈
官方服务:
资源简介:
Gene, repeat, and transposable element predictions for the <i>Parastagonospora nodorum</i> isolates analysed.<br>Short variant predictions for the <i>P. nodorum</i> pangenome relative to SN15 are also included in VCF format. (<b>combined.vcf.gz</b>)<br>Many of these isolates, including SN15, are also deposited to the NCBI under bioprojects: https://www.ncbi.nlm.nih.gov/bioproject/PRJNA612761, and https://www.ncbi.nlm.nih.gov/bioproject/PRJNA686477.<br>These assemblies are slightly different due to the submission requirements of the NCBI. The assemblies here are the same as the ones analysed in the manuscript.The file <b>changes.tsv</b> contains details of scaffold splits and removed genes from illumina sequenced isolates. The file <b>gene_changes.tsv</b> includes further details of removed genes in the illumina sequenced isolates. The only major changes to SN15 (other than removal of low-confidence genes) are that 100bp of N's was removed from the 5' end of Chromosome 7 (meaning that all features are shifted over by 100bp), and one gene was pseudogenised due to an internal stop codon.<br>The isolates sequenced as part of Richards 2018 (https://doi.org/10.1534/g3.117.300462; SN4, SN2000, SN79) and Syme 2018 (https://doi.org/10.1093/gbe/evy192; RSID*) are deposited here only, as we don't control the NCBI entries and there are currently no publicly available annotations for these genomes.<br>Each zipped folder will unzip to contain something like the following structure:<br>20200513-14FG141-genome.fasta # The actual assembly.20200513-14FG141-genome.fasta.md520200513-14FG141-genome_contigs.gff3 # Mapping contics to scaffolds.20200513-14FG141-genome_contigs.gff3.md520200513-14FG141-genome_softmasked.fasta # A soft-masked version of the assembly for further gene prediction.20200513-14FG141-genome_softmasked.fasta.md520200513-14FG141-mitochondrial.fasta # The mitochondrial assembly for this isolate.20200513-14FG141-mitochondrial.fasta.md520200513-14FG141-mitochondrial_contigs.gff320200513-14FG141-mitochondrial_contigs.gff3.md520200513-14FG141-repeats.gff3 # Repeat and transposable element annotations20200513-14FG141-repeats.gff3.md520200519-14FG141-genes.gff3 # Protein coding, rRNA, and tRNA genes for this assembly.20200519-14FG141-genes.gff3.md520200519-14FG141-proteins.fasta # Extracted protein coding files.20200519-14FG141-proteins.fasta.md520200519-14FG141-transcripts.fasta # Extracted exon nucleotide sequences (note that this are not CDSs).20200519-14FG141-transcripts.fasta.md5CHANGELOG.txt # Details of changes and how files were created.<br><br>Transposable elements and repeats were predicted using the [PanTE](https://github.com/darcyabjones/pante) pipeline.Protein coding genes were predicted from softmasked genomes using the [panann pipeline](https://github.com/darcyabjones/panann).<br>Protein coding genes overlapping rRNA genes by more tha 50% of their length were excluded.<br>Protein coding genes with exons overlapping genome gaps (stretches of N &gt;= 100bp)were split into fragments, annotated in the GFF with the attribute `fragmented=true`.Note that we had some protein genes looked a bit dubious (lots of short exons).We attempted to mark these based on what support they have in the gff withthe attribute `low_confidence_prediction=true`. This attribute can be manuallyremoved if the gene looks fine to you.<br>The rRNA genes are predicted using RNAmmer v1.2, with some predictions comingfrom repeatmasker.The tRNA genes are predicted using tRNAScan-SE v2.0.3.<br><br>The SN15 annotations are an updated version of the ones published in Bertazzoni et al. (https://doi.org/10.1186/s12864-021-07699-8).Newly predicted genes were added as a "C" set (in addition to the existing A, and B sets) if they didn't overlap an existing annotation in the same strand by more than 20% of the length of the previous annotation annotation.All protein coding gene annotations have an attribute `confidence_set=` whichindicates the A, B (previous), or C (updated) gene predictions.
提供机构:
figshare
创建时间:
2020-12-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作