A Third Approach to Gene Prediction Suggests Thousands of Additional Human Transcribed Regions

NIAID Data Ecosystem2026-03-06 收录

下载链接：

https://figshare.com/articles/dataset/A_Third_Approach_to_Gene_Prediction_Suggests_Thousands_of_Additional_Human_Transcribed_Regions/153000

下载链接

链接失效反馈

官方服务：

资源简介：

The identification and characterization of the complete ensemble of genes is a main goal of deciphering the digital information stored in the human genome. Many algorithms for computational gene prediction have been described, ultimately derived from two basic concepts: (1) modeling gene structure and (2) recognizing sequence similarity. Successful hybrid methods combining these two concepts have also been developed. We present a third orthogonal approach to gene prediction, based on detecting the genomic signatures of transcription, accumulated over evolutionary time. We discuss four algorithms based on this third concept: Greens and CHOWDER, which quantify mutational strand biases caused by transcription-coupled DNA repair, and ROAST and PASTA, which are based on strand-specific selection against polyadenylation signals. We combined these algorithms into an integrated method called FEAST, which we used to predict the location and orientation of thousands of putative transcription units not overlapping known genes. Many of the newly predicted transcriptional units do not appear to code for proteins. The new algorithms are particularly apt at detecting genes with long introns and lacking sequence conservation. They therefore complement existing gene prediction methods and will help identify functional transcripts within many apparent “genomic deserts.”

完整解析人类基因组中存储的数字化遗传信息，其核心目标之一是对全部基因集合开展鉴定与特征刻画。目前已报道多种计算基因预测算法，其核心思路最终均可归为两大基本范式：一是基因结构建模，二是序列相似性识别。兼具两类思路的混合基因预测算法亦已被开发。本研究提出第三种正交的基因预测新方法，其核心基于对演化过程中累积的转录相关基因组特征的检测。基于这一独立范式，我们开发并阐述了四款算法：Greens与CHOWDER用于量化由转录偶联DNA修复（transcription-coupled DNA repair）引发的突变链偏好性；ROAST与PASTA则基于针对多聚腺苷酸化信号（polyadenylation signals）的链特异性选择机制。我们将这四款算法整合为一款名为FEAST的一体化预测工具，并借助其预测了数千个未与已知基因重叠的潜在转录单元的位置与方向。其中多数新预测得到的转录单元似乎并不编码蛋白质。这些全新算法尤其擅长检测内含子较长且序列保守性缺失的基因。因此，该方法可作为现有基因预测工具的有效补充，将助力在诸多表观‘基因组荒漠（genomic deserts）’中识别功能性转录本。

创建时间：

2006-03-17

5,000+

优质数据集

54 个

任务类型

进入经典数据集