Sequences and annotations of a provisional genome draft of a Senegalese sole female (Sosen1) and a male (Sse05_10M)
收藏Figshare2020-06-12 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Sequences_and_annotations_of_a_provisional_genome_draft_of_a_Senegalese_sole_female/12472100
下载链接
链接失效反馈官方服务:
资源简介:
Information as in 2018 of a female Senegalese sole genome (Sosen1) after Nanopore sequencing. Unzip the archive 1) Sosen1_genome_draft.zip to find: • Sosen1_genome_scaffolds.fasta containing every contig and scaffold identifier and sequence in fasta format. • Sosen1_genome_annotation.gff3 corresponding to a provisional annotation of genome contigs and scalffolds from (1) using MAKER2 and transcript sequences in SOLSEv5.0. • Sosen1_maker.transcripts.fasta containing the deduced transcripts from the gff3 annotation file. • Sosen1_maker.proteins.fasta containing the deduced amino acid sequence for all transcripts from (3). • Sosen1_maker.proteins_annotation.tsv containing a complete annotation of (3) and (4) performed with our software Full-LengtherNext. This includes transcript and protein lengths, best UniProtKB orthologue with identity % and E-value, structural status, open reading frame location in the transcript, description, GOs, KEGG codes, InterPro IDs, Pfam, EC and Unipathway, as tab-separated values (tsv format).The Sosen1 (or SENf1A) female genome was reannotated in 2020. Data are in the file2) Sosen1_female_reannotation_2020.zip that once unzipped provides the following files:• SENf1A.gff3.gz --> gff3 file with the protein coding annotation• SSENf1A.stats.txt.gz --> Stats of the protein-coding annotation• SSENf1A.transcripts.fa.gz --> multifasta file with the protein-coding annotated transcripts• SSENf1A.pep.fa.gz --> aminoacid sequence of the annotated proteins• SSENf1A.cds.fa.gz --> nucleotide sequence of the annotated proteins• SSENf1A.longestpeptide.fa.gz --> aminoacid sequence of the longest protein annotated for each gene• SSENf1ncA.gff3.gz --> gff3 file with the non-coding annotation• SSENf1ncA.transcripts.fa.gz --> multifasta file with the non-coding transcriptsInformation as in 2020 of a male Senegalese sole genome Sse05_10M (or Sosen2 or SSENm1B) after a hybrid sequencing an assembling. 3) Sosen2_male_genome_scaffolds.fasta contain the genome scaffolds4) Sosen2_annotations.zip contains the male genome integrated with genetic markers to provide linkage groups as chromosome surrogates, as well as gene annotations in the following files:• Male_LA_Total.fasta.gz --> male genome assembly• SSENm1B.gff3.fz --> gff3 file with the protein coding annotation• SSENm1B.stats.txt.gz --> Stats of the protein-coding annotation• SSENm1B.transcripts.fa.gz --> multifasta file with the protein-coding annotated transcripts• SSENm1B.pep.fa.gz --> aminoacid sequence of the annotated proteins• SSENm1B.cds.fa.gz --> nucleotide sequence of the annotated proteins• SSENm1B.longestpeptide.fa.gz --> aminoacid sequence of the longest protein annotated for each gene• SSENm1ncB.gff3.gz --> gff4 file with the non-coding annotation• SSENm1ncB.transcripts.fa.gz --> multifasta file with the non-coding transcripts
本数据集收录2018年经纳米孔(Nanopore)测序得到的雌性塞内加尔鳎基因组(Sosen1)相关数据。请解压归档文件1)Sosen1_genome_draft.zip,解压后可获取以下文件:
1. Sosen1_genome_scaffolds.fasta:以FASTA格式存储所有重叠群(contig)与支架(scaffold)的标识符及序列的文件。
2. Sosen1_genome_annotation.gff3:基于步骤1中的重叠群与支架序列,使用MAKER2软件结合SOLSEv5.0转录组序列进行临时注释得到的GFF3格式注释文件。
3. Sosen1_maker.transcripts.fasta:由上述GFF3注释文件推导得到的转录本序列FASTA文件。
4. Sosen1_maker.proteins.fasta:由步骤3中所有转录本推导得到的氨基酸序列FASTA文件。
5. Sosen1_maker.proteins_annotation.tsv:使用自研软件Full-LengtherNext对步骤3、4的序列进行完整注释得到的TSV格式文件,包含转录本与蛋白质长度、最优UniProtKB同源蛋白的相似度百分比与E值、结构状态、转录本中的开放阅读框位置、功能描述、基因本体(Gene Ontology,GO)条目、KEGG编号、InterPro编号、Pfam家族、酶学委员会(Enzyme Commission,EC)编号以及Unipathway通路信息。
该雌性塞内加尔鳎基因组Sosen1(亦称为SENf1A)已于2020年完成重新注释,相关数据存储于归档文件2)Sosen1_female_reannotation_2020.zip中,解压后可得到以下文件:
• SENf1A.gff3.gz:包含蛋白编码基因注释的GFF3格式压缩文件
• SSENf1A.stats.txt.gz:蛋白编码基因注释统计压缩文件
• SSENf1A.transcripts.fa.gz:包含蛋白编码注释转录本的多序列FASTA压缩文件
• SSENf1A.pep.fa.gz:注释蛋白质的氨基酸序列压缩文件
• SSENf1A.cds.fa.gz:注释蛋白质的编码序列(Coding Sequence,CDS)核苷酸序列压缩文件
• SSENf1A.longestpeptide.fa.gz:每个基因对应的最长注释蛋白质氨基酸序列压缩文件
• SSENf1ncA.gff3.gz:包含非编码RNA注释的GFF3格式压缩文件
• SSENf1ncA.transcripts.fa.gz:包含非编码转录本的多序列FASTA压缩文件
本数据集同时收录2020年经混合测序与组装得到的雄性塞内加尔鳎基因组Sse05_10M(亦称为Sosen2或SSENm1B)相关数据。3)Sosen2_male_genome_scaffolds.fasta:存储该雄性基因组支架序列的文件。
4)Sosen2_annotations.zip:该归档整合了遗传标记以构建作为染色体替代物的连锁群,并包含以下基因注释文件:
• Male_LA_Total.fasta.gz:雄性基因组组装序列压缩文件
• SSENm1B.gff3.fz:包含蛋白编码基因注释的GFF3格式压缩文件
• SSENm1B.stats.txt.gz:蛋白编码基因注释统计压缩文件
• SSENm1B.transcripts.fa.gz:包含蛋白编码注释转录本的多序列FASTA压缩文件
• SSENm1B.pep.fa.gz:注释蛋白质的氨基酸序列压缩文件
• SSENm1B.cds.fa.gz:注释蛋白质的编码序列(Coding Sequence,CDS)核苷酸序列压缩文件
• SSENm1B.longestpeptide.fa.gz:每个基因对应的最长注释蛋白质氨基酸序列压缩文件
• SSENm1ncB.gff3.gz:包含非编码RNA注释的GFF4格式压缩文件
• SSENm1ncB.transcripts.fa.gz:包含非编码转录本的多序列FASTA压缩文件
创建时间:
2020-06-12



