Additional file 1: Figure S1. of OGS2: genome re-annotation of the jewel wasp Nasonia vitripennis

Mendeley Data2024-06-25 更新2024-06-27 收录

下载链接：

https://springernature.figshare.com/articles/dataset/Additional_file_1_Figure_S1_of_OGS2_genome_re-annotation_of_the_jewel_wasp_Nasonia_vitripennis/4438373

下载链接

链接失效反馈

官方服务：

资源简介：

Expression values relative to gene structures for RNA-Seq (Reads) and genome tiling path microarrays (Tile) for species Nasonia (purple, this project), Drosophila (red, blue, [74]) and Daphnia (green, [107]). Annotated gene near-exon spans are scored per base for average expression scores from the data sets, and relative expression plotted with respect to gene transcript start (first exon), stop (last exon), and inner exon start, stop positions. Both methods (genome tiling and RNA-Seq) have abrupt expression strength changes at exon boundaries, on average, indicating their value in modeling gene structure positions. Expression scores are read-coverage for RNA-Seq, and log-normalized intensity for tiling array, as described in the Methods section. Figure S2. Gene modeling example with tile expression data, including gene evidence (upper tracks with tiling, introns, proteins), tiling TAR-exon to Exonerate models (middle), and gene predictions from tile TAR hints (lower), on genome map. The lower tracks have excessive false UTR spans attached to gene models, primarily due to tiling expression that lacks gene start/stop and intron splice joining signals. These false UTR spans are supported by expression evidence, but as a combination of alternate exons, separate gene loci, and non-coding expression. Intermediate tracks (Exonerate models) often match gene structures from other methods, but have a high proportion of unsupported exon extensions as for lower track. Figure S3. Gene join error example. A mistaken gene model from honey bee (tan, lower, LOC552483) is transferred to Nasonia in NCBI RefSeq models (dark orange, middle), merging a ribosomal protein (right) and Ankyrin repeat protein (left). EvidentialGene models (yellow, top) did not contain this mistake, due to the combination of RNA-Seq assemblies (purple, bottom) that are un-joined (but could be parts of one gene), the lack of intron joining evidence, and the orthology assessment metrics that distinguish gene joins from true complete genes. NCBI Refseq models for both Apis (new LOC102654426 and mRpL52 in NCBI Apis rel. 102) and Nasonia have been updated to correct this join error. Figure S4. Log counts of methylated and unmethylated genes in different classes of expression support. Grey bars indicate genes with no known methylation status. (ZIP 938Â kb)

针对3个物种的RNA测序（RNA-Seq）与基因组拼接路径微阵列（Tile）数据，计算其相对于基因结构的表达量值：其中金小蜂属（Nasonia，本项目，紫色）、果蝇（Drosophila，红色、蓝色，文献[74]）以及水蚤（Daphnia，绿色，文献[107]）的相关数据均纳入统计。针对注释基因的外显子邻近区域，基于本数据集的平均表达量按碱基位点进行打分，并以基因转录起始位点（第一外显子）、终止位点（最后一个外显子）以及内部外显子的起始、终止位置为参照，绘制相对表达量图谱。平均而言，两种实验方法（基因组拼接路径微阵列与RNA-Seq）均在外显子边界处呈现出表达强度的剧烈变化，这证明二者在基因结构位置建模中具备应用价值。如方法部分所述，RNA-Seq的表达分值为读取覆盖度（read-coverage），而拼接阵列的表达分值为对数归一化强度（log-normalized intensity）。补充图S2：基于Tile表达数据的基因建模示例，展示于基因组图谱上，其中包含基因证据轨道（上方轨道：Tile数据、内含子、蛋白质证据）、Tile TAR-外显子至Exonerate模型（中间轨道）以及基于Tile TAR提示的基因预测结果（下方轨道）。下方轨道的基因模型附带了过多的非编码区（UTR，Untranslated Region）假阳性延伸区域，这主要是由于Tile表达数据缺乏基因起始/终止位点以及内含子剪接连接信号所致。这些非编码区假阳性延伸区域虽有表达证据支持，但实为可变外显子、独立基因位点以及非编码表达区域的混合产物。中间轨道（Exonerate模型）通常能与其他方法得到的基因结构相匹配，但与下方轨道类似，存在大量无证据支持的外显子延伸区域。补充图S3：基因拼接错误示例。源自蜜蜂属（Apis）的错误基因模型（下方浅棕色轨道，LOC552483）被迁移至NCBI参考序列数据库（NCBI RefSeq）的金小蜂基因模型中（中间深橙色轨道），该错误将右侧的核糖体蛋白基因与左侧的锚蛋白重复序列蛋白基因进行了错误拼接。EvidentialGene模型（上方黄色轨道）未出现该错误，原因在于：其一，RNA-Seq组装结果（下方紫色轨道）未发生拼接（但可能属于同一基因的不同片段）；其二，缺乏内含子拼接证据；其三，通过同源性评估指标可区分基因拼接错误与真正的完整基因。目前NCBI已更新蜜蜂（NCBI蜜蜂基因组版本102中的新LOC102654426与mRpL52）与金小蜂的RefSeq模型，以修正该拼接错误。补充图S4：不同表达支持等级下的甲基化与非甲基化基因对数计数。灰色条形代表未知甲基化状态的基因。（压缩包大小：938 KB）

创建时间：

2023-06-28