Discovering Functional Modules by Topic Modeling RNA-Seq Based Toxicogenomic Data
收藏NIAID Data Ecosystem2026-03-09 收录
下载链接:
https://figshare.com/articles/dataset/Discovering_Functional_Modules_by_Topic_Modeling_RNA_Seq_Based_Toxicogenomic_Data/2255170
下载链接
链接失效反馈官方服务:
资源简介:
Toxicogenomics (TGx) endeavors to
elucidate the underlying molecular
mechanisms through exploring gene expression profiles in response
to toxic substances. Recently, RNA-Seq is increasingly regarded as
a more powerful alternative to microarrays in TGx studies. However,
realizing RNA-Seq’s full potential requires novel approaches
to extracting information from the complex TGx data. Considering read
counts as the number of times a word occurs in a document, gene expression
profiles from RNA-Seq are analogous to a word by document matrix used
in text mining. Topic modeling aiming at to discover the latent structures
in text corpora would be helpful to explore RNA-Seq based TGx data.
In this study, topic modeling was applied on a typical RNA-Seq based
TGx data set to discover hidden functional modules. The RNA-Seq based
gene expression profiles were transformed into “documents”,
on which latent Dirichlet allocation (LDA) was used to build a topic
model. We found samples treated by the compounds with the same modes
of actions (MoAs) could be clustered based on topic similarities.
The topic most relevant to each cluster was identified as a “marker”
topic, which was interpreted by gene enrichment analysis with MoAs
then confirmed by compound and pathways associations mined from literature.
To further validate the “marker” topics, we tested topic
transferability from RNA-Seq to microarrays. The RNA-Seq based gene
expression profile of a topic specifically associated with peroxisome
proliferator-activated receptors (PPAR) signaling pathway was used
to query samples with similar expression profiles in two different
microarray data sets, yielding accuracy of about 85%. This proof-of-concept
study demonstrates the applicability of topic modeling to discover
functional modules in RNA-Seq data and suggests a valuable computational
tool for leveraging information within TGx data in RNA-Seq era.
创建时间:
2016-02-16



