five

Discovering Functional Modules by Topic Modeling RNA-Seq Based Toxicogenomic Data

收藏
Figshare2016-02-16 更新2026-04-29 收录
下载链接:
https://figshare.com/articles/dataset/Discovering_Functional_Modules_by_Topic_Modeling_RNA_Seq_Based_Toxicogenomic_Data/2255170
下载链接
链接失效反馈
官方服务:
资源简介:
Toxicogenomics (TGx) endeavors to elucidate the underlying molecular mechanisms through exploring gene expression profiles in response to toxic substances. Recently, RNA-Seq is increasingly regarded as a more powerful alternative to microarrays in TGx studies. However, realizing RNA-Seq’s full potential requires novel approaches to extracting information from the complex TGx data. Considering read counts as the number of times a word occurs in a document, gene expression profiles from RNA-Seq are analogous to a word by document matrix used in text mining. Topic modeling aiming at to discover the latent structures in text corpora would be helpful to explore RNA-Seq based TGx data. In this study, topic modeling was applied on a typical RNA-Seq based TGx data set to discover hidden functional modules. The RNA-Seq based gene expression profiles were transformed into “documents”, on which latent Dirichlet allocation (LDA) was used to build a topic model. We found samples treated by the compounds with the same modes of actions (MoAs) could be clustered based on topic similarities. The topic most relevant to each cluster was identified as a “marker” topic, which was interpreted by gene enrichment analysis with MoAs then confirmed by compound and pathways associations mined from literature. To further validate the “marker” topics, we tested topic transferability from RNA-Seq to microarrays. The RNA-Seq based gene expression profile of a topic specifically associated with peroxisome proliferator-activated receptors (PPAR) signaling pathway was used to query samples with similar expression profiles in two different microarray data sets, yielding accuracy of about 85%. This proof-of-concept study demonstrates the applicability of topic modeling to discover functional modules in RNA-Seq data and suggests a valuable computational tool for leveraging information within TGx data in RNA-Seq era.

毒理基因组学(Toxicogenomics, TGx)旨在通过探究机体对有毒物质应答的基因表达谱,阐明其潜在的分子机制。近年来,RNA测序(RNA-Seq)在毒理基因组学研究中日益被视为优于基因芯片(microarray)的更具潜力的替代技术。然而,要充分发挥RNA测序的应用潜力,仍需开发全新的分析方法以从复杂的毒理基因组学数据中提取有效信息。若将测序读段计数(read counts)视作文档中词汇的出现频次,则RNA测序得到的基因表达谱与文本挖掘中常用的词-文档矩阵(word by document matrix)具有相似性。旨在从文本语料库(text corpora)中挖掘潜在结构的主题建模(Topic modeling)方法,可用于辅助分析基于RNA测序的毒理基因组学数据。本研究将主题建模方法应用于典型的基于RNA测序的毒理基因组学数据集,以挖掘其中隐藏的功能模块。研究中将基于RNA测序的基因表达谱转化为"documents",并采用潜在狄利克雷分配(latent Dirichlet allocation, LDA)构建主题模型。本研究发现,具有相同作用模式(modes of actions, MoAs)的化合物处理的样本,可根据主题相似度进行聚类。每个聚类对应的最相关主题被定义为"marker"主题,通过基因富集分析对其进行功能解读,并结合从文献中挖掘的化合物与通路关联信息,验证其与对应作用模式的关联性。为进一步验证"marker"主题的可靠性,本研究测试了主题在RNA测序与基因芯片数据间的可迁移性。本研究选取与过氧化物酶体增殖物激活受体(peroxisome proliferator-activated receptors, PPAR)信号通路特异性相关的主题对应的RNA测序基因表达谱,在两个独立的基因芯片数据集中检索具有相似表达谱的样本,最终得到约85%的准确率。本概念验证研究证实了主题建模方法在RNA测序数据中挖掘功能模块的适用性,并表明在RNA测序时代,该方法可作为一种极具价值的计算工具,用于挖掘毒理基因组学数据中的有效信息。
创建时间:
2016-02-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作