Sequences and annotations of a provisional genome draft of a Senegalese sole female (Sosen1) and a male (Sse05_10M)
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/Sequences_and_annotations_of_a_provisional_genome_draft_of_a_Senegalese_sole_female/12472100
下载链接
链接失效反馈官方服务:
资源简介:
Information as in 2018 of a female Senegalese sole genome (Sosen1) after Nanopore sequencing. Unzip the archive
1) Sosen1_genome_draft.zip to find:
• Sosen1_genome_scaffolds.fasta containing every contig and scaffold identifier and sequence in fasta format.
• Sosen1_genome_annotation.gff3 corresponding to a provisional annotation of genome contigs and scalffolds from (1) using MAKER2 and transcript sequences in SOLSEv5.0.
• Sosen1_maker.transcripts.fasta containing the deduced transcripts from the gff3 annotation file.
• Sosen1_maker.proteins.fasta containing the deduced amino acid sequence for all transcripts from (3).
• Sosen1_maker.proteins_annotation.tsv containing a complete annotation of (3) and (4) performed with our software Full-LengtherNext. This includes transcript and protein lengths, best UniProtKB orthologue with identity % and E-value, structural status, open reading frame location in the transcript, description, GOs, KEGG codes, InterPro IDs, Pfam, EC and Unipathway, as tab-separated values (tsv format).
The Sosen1 (or SENf1A) female genome was reannotated in 2020. Data are in the file
2) Sosen1_female_reannotation_2020.zip that once unzipped provides the following files:
• SENf1A.gff3.gz --> gff3 file with the protein coding annotation
• SSENf1A.stats.txt.gz --> Stats of the protein-coding annotation
• SSENf1A.transcripts.fa.gz --> multifasta file with the protein-coding annotated transcripts
• SSENf1A.pep.fa.gz --> aminoacid sequence of the annotated proteins
• SSENf1A.cds.fa.gz --> nucleotide sequence of the annotated proteins
• SSENf1A.longestpeptide.fa.gz --> aminoacid sequence of the longest protein annotated for each gene
• SSENf1ncA.gff3.gz --> gff3 file with the non-coding annotation
• SSENf1ncA.transcripts.fa.gz --> multifasta file with the non-coding transcripts
Information as in 2020 of a male Senegalese sole genome Sse05_10M (or Sosen2 or SSENm1B) after a hybrid sequencing an assembling.
3) Sosen2_male_genome_scaffolds.fasta contain the genome scaffolds
4) Sosen2_annotations.zip contains the male genome integrated with genetic markers to provide linkage groups as chromosome surrogates, as well as gene annotations in the following files:
• Male_LA_Total.fasta.gz --> male genome assembly
• SSENm1B.gff3.fz --> gff3 file with the protein coding annotation
• SSENm1B.stats.txt.gz --> Stats of the protein-coding annotation
• SSENm1B.transcripts.fa.gz --> multifasta file with the protein-coding annotated transcripts
• SSENm1B.pep.fa.gz --> aminoacid sequence of the annotated proteins
• SSENm1B.cds.fa.gz --> nucleotide sequence of the annotated proteins
• SSENm1B.longestpeptide.fa.gz --> aminoacid sequence of the longest protein annotated for each gene
• SSENm1ncB.gff3.gz --> gff4 file with the non-coding annotation
• SSENm1ncB.transcripts.fa.gz --> multifasta file with the non-coding transcripts
创建时间:
2020-06-12



