Targeted RT-PCR assays spanning unannotated splice junctions sequenced by Roche 454.. Homo sapiens

NIAID Data Ecosystem2026-03-07 收录

下载链接：

https://www.ncbi.nlm.nih.gov/bioproject/PRJNA169392

下载链接

链接失效反馈

官方服务：

资源简介：

The ENCODE projects seeks to identify and characterize functional elements in the human genome. Throughout the scale-up phase of ENCODE, the transcriptome group has generate Long RNA-Seq, Small RNA-Seq, Cap-Analysis of Gene Expression (CAGE), and RNA-PET short read data on the Illumina platform for ~ 40 different human primary and transformed cell lines in replicate. From these data several high-resolution and discrete features/elements have been mined out (5’ caps, splice junctions, polyadenylation sites, small RNAs, etc…). However, because these data are obtained from short-read data, we have only limited “connectivity” information. For example, from the long RNA-Seq data, which was sequenced in mate-pair fashion with average insert sizes ~ 200 bp, we know that the sequence from mate 1 is physically linked to the sequence in mate 2. We don’t know the sequence in between and we don’t know how this mate-pair is connected to other mate-pairs in the context of longer transcripts in vivo. To date, this information is gleaned from models generated in silico: In our case, by the program Cufflinks. Consequently, we have a collection of transcript models exhibiting a vast array of local complexity assembled from short read data that need to be experimentally tested. Alternatively, one can “cut to the chase” and use a more raw/elemental form of the data as a basis for additional experimentation to clone out the longer sequences generated in vivo. For data usage terms and conditions, please refer to http://www.genome.gov/27528022 and http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf Overall design: 454 Data from HepG2, HUVEC, and H1 ES cells

ENCODE计划旨在识别并表征人类基因组中的功能元件。在ENCODE项目的规模化扩增阶段，转录组研究团队已在Illumina测序平台上，针对约40种不同的人类原代细胞与转化细胞系开展重复实验，生成了长RNA测序（Long RNA-Seq）、小RNA测序（Small RNA-Seq）、基因表达帽分析（Cap-Analysis of Gene Expression, CAGE）以及RNA-PET短读长测序数据。基于上述数据，研究人员已挖掘出多个高分辨率离散特征/元件（如5'帽结构、剪接位点、多腺苷酸化位点、小RNA等）。然而，由于这些数据均来自短读长测序，我们所能获取的"连接性"信息十分有限。例如，本次使用的长RNA测序数据采用配对末端（mate-pair）测序模式，平均插入片段长度约为200 bp，我们仅能得知双端测序的第一条序列与第二条序列存在物理连接关系，但无法获知两个序列之间的碱基序列，也无法知晓在体内更长转录本的语境下，该双端序列如何与其他双端序列建立连接。截至目前，此类信息均通过计算机模拟（in silico）生成的模型推导而来，在本研究中即通过Cufflinks软件完成。因此，我们基于短读长测序数据组装得到了一系列展现出丰富局部复杂度的转录本模型，这些模型均需通过实验验证。或者，研究者也可以直接跳过建模步骤，采用更为原始的基础数据形式作为后续实验的基础，以克隆得到体内生成的更长转录本序列。有关数据使用的条款与条件，请参阅：http://www.genome.gov/27528022 与 http://www.genome.gov/Pages/Research/ENCODE/ENCODEDataReleasePolicyFinal2008.pdf。整体实验设计：采用454测序平台获取的HepG2、HUVEC及H1胚胎干细胞（H1 ES cells）测序数据。

创建时间：

2012-06-22

5,000+

优质数据集

54 个

任务类型

进入经典数据集