Additional file 1: Table S1. of FRAMA: from RNA-seq data to annotated mRNA assemblies

Name: Additional file 1: Table S1. of FRAMA: from RNA-seq data to annotated mRNA assemblies
Creator: Figshare
Published: 2024-12-13 11:45:48
License: 暂无描述

DataCite Commons2024-12-13 更新2024-07-25 收录

下载链接：

https://springernature.figshare.com/articles/dataset/Additional_file_1_Table_S1_of_FRAMA_from_RNA-seq_data_to_annotated_mRNA_assemblies/4399325

下载链接

链接失效反馈

官方服务：

资源简介：

List of external software. Table S2: NMR transcript data set TCUR, and orthologous transcripts from human, mouse and guinea pig. Multi-species mRNA alignments were constructed independently from those described in the main text, using the sequence database entries as listed. Table S3: Naked mole-rat samples for strand-specific RNA-seq, and produced RNA-seq data. Table S4: Pairwise transcript sequence identities between NMR and related mammals. The analysis is based on 142 multiple sequence alignments of the CDSs of NMR, guinea pig, human and mouse (as listed in Additional file 1: Table S2). Identity values were computed based on gap-masked alignments. Table S5: Statistics of the transcriptome data produced by Trinity (column “transcript assembly”) and subsequently processed using FRAMA (column “transcript catalog”). Table S6: CEGMA results on transcriptome datasets. As defined by CEGMA, ‘complete proteins’ are recovered with >70 % in comparison to CEGMA’s core proteins. ‘Partial proteins’ additionally include proteins, which exceed a certain alignment score threshold. CEGMAs software components were used as suggested: geneid (v1.4), genewise (wise2.2.3-rc7), hmmer (HMMER 3.0), NCBI BLAST+ (2.2.25). Table S7: Source of transcript sequence sets and underlying input data. Table S8: Transcript-genome alignment statistics of curated dataset (TCUR) in hetgla1. The alignments comprise 1473 well-aligned blocks and 81 unaligned or mismatching blocks. Transcripts show 99.9 % average identity within well-aligned blocks. Table S9: Transcript-genome alignment of curated dataset (TCUR) in hetgla2. The alignments comprise 1525 well-aligned blocks and 16 unaligned or mismatching blocks. Transcripts show 99.9 % average identity within well-aligned blocks. Table S10: Correspondence of gene symbols between transcript sets. The evaluation considered gene loci overlapping in the hetgla2 genome sequence, where all transcript-genome alignments of a gene were considered to define the gene locus. Only genes with ascertained function (non-LOC gene symbol) were compared. Table S11: Accession numbers of sequences that are shown in the genome-based transcript map (hetgla2, scaffold JH602043; Fig. 4). Accession numbers for each sequence are listed in the same order as shown in Fig. 4 (from top to bottom). (XLSX 77 kb)

外部软件列表。表S2：裸鼹鼠（Naked Mole-Rat, NMR）转录组数据集TCUR，以及来自人类、小鼠和豚鼠的同源转录本。本研究采用所列序列数据库条目，构建了独立于正文所述的多物种mRNA比对序列。表S3：用于链特异性RNA测序的裸鼹鼠样本及所产生的RNA测序数据。表S4：裸鼹鼠与相关哺乳动物的转录本两两序列同源性。本分析基于裸鼹鼠、豚鼠、人类及小鼠的142条编码序列（Coding DNA Sequences, CDS）多序列比对（详见补充文件1表S2）。同源性数值基于屏蔽间隙的比对序列计算得到。表S5：由Trinity组装（列"转录本组装"）并经FRAMA后续处理（列"转录本目录"）得到的转录组数据统计信息。表S6：转录组数据集的CEGMA分析结果。根据CEGMA的定义，"完整蛋白"指与CEGMA核心蛋白相比，序列覆盖度超过70%的被检出蛋白；"部分蛋白"还包含满足特定比对评分阈值的蛋白。所用CEGMA软件组件均按官方建议配置：geneid（v1.4版）、genewise（wise2.2.3-rc7版）、HMMER（3.0版，原文标注为hmmer）、NCBI BLAST+（2.2.25版）。表S7：转录本序列集及相关输入数据的来源。表S8：hetgla1版本人工注释数据集（TCUR）的转录本-基因组比对统计信息。本次比对共包含1473个良好比对区块，以及81个未比对或错配区块；良好比对区块内的转录本平均序列同源性为99.9%。表S9：hetgla2版本人工注释数据集（TCUR）的转录本-基因组比对信息。本次比对共包含1525个良好比对区块，以及16个未比对或错配区块；良好比对区块内的转录本平均序列同源性为99.9%。表S10：不同转录本集之间的基因符号对应关系。本次评估以hetgla2基因组序列中存在重叠的基因座为对象，以某一基因的所有转录本-基因组比对结果来定义该基因座；仅纳入具有明确功能的基因（非LOC前缀基因符号）进行比对分析。表S11：基于基因组的转录本图谱（hetgla2版本，支架序列JH602043；图4）中所用序列的登录号。各序列的登录号按图4的展示顺序（从上至下）排列。（XLSX格式，大小77 kb）

提供机构：

Figshare

创建时间：

2016-12-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集