Supporting data for "Imputing missing RNA-seq data from DNA methylation by using transfer learning based neural network"
收藏DataCite Commons2025-05-26 更新2025-04-15 收录
下载链接:
http://gigadb.org/dataset/100767
下载链接
链接失效反馈官方服务:
资源简介:
Gene expression plays a key intermediate role in linking molecular features at DNA level and phenotype. However, due to various limitations in experiments, the RNA-seq data is missing in many samples while there exists highquality of DNA methylation data. As DNA methylation is an important epigenetic modification to regulate gene expression, it can be used to predict RNA-seq data. For this purpose, many methods have been developed. A common limitation of these methods is that they mainly focus on single cancer dataset, and do not fully utilize information from large pan-cancer dataset.<br>Here, we have developed a novel method to impute missing gene expression data from DNA methylation data through transfer learning-based neural network, namely TDimpute. In the method, the pan-cancer dataset from The Cancer Genome Atlas (TCGA) was utilized for training a general model, which was then fine-tuned on the specific cancer dataset. By testing on 16 cancer datasets, we found that our method significantly outperforms other state-ofthe-art methods in imputation accuracy with 7%-11% increase under different missing rates. The imputed gene expression was further proved to be useful for downstream analyses, including the identification of both methylationdriving and prognosis-related genes, clustering analysis, and survival analysis on the TCGA dataset. More importantly, our method was indicated to be useful for general purpose by the independent test on the Wilms tumor dataset from the Therapeutically Applicable Research To Generate Effective Treatments (TARGET) project. TDimpute is an effective method for RNA-seq imputation with limited training samples.
提供机构:
GigaScience Database
创建时间:
2020-06-24



