Table1_WAFNRLTG: A Novel Model for Predicting LncRNA Target Genes Based on Weighted Average Fusion Network Representation Learning Method.DOCX
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://figshare.com/articles/dataset/Table1_WAFNRLTG_A_Novel_Model_for_Predicting_LncRNA_Target_Genes_Based_on_Weighted_Average_Fusion_Network_Representation_Learning_Method_DOCX/18666083
下载链接
链接失效反馈官方服务:
资源简介:
Long non-coding RNAs (lncRNAs) do not encode proteins, yet they have been well established to be involved in complex regulatory functions, and lncRNA regulatory dysfunction can lead to a variety of human complex diseases. LncRNAs mostly exert their functions by regulating the expressions of target genes, and accurate prediction of potential lncRNA target genes would be helpful to further understanding the functional annotations of lncRNAs. Considering the limitations in traditional computational methods for predicting lncRNA target genes, a novel model which was named Weighted Average Fusion Network Representation learning for predicting LncRNA Target Genes (WAFNRLTG) was proposed. First, a novel heterogeneous network was constructed by integrating lncRNA sequence similarity network, mRNA sequence similarity network, lncRNA-mRNA interaction network, lncRNA-miRNA interaction network and mRNA-miRNA interaction network. Next, four popular network representation learning methods were utilized to gain the representation vectors of lncRNA and mRNA nodes. Then, the representations of lncRNAs and target genes in the heterogeneous network were obtained with the weighted average fusion network representation learning method. Finally, we merged the representations of lncRNAs and related target genes to form lncRNA-gene pairs, trained the XGBoost classifier and predicted potential lncRNA target genes. In five-cross validations on the training and independent datasets, the experimental results demonstrated that WAFNRLTG obtained better AUC scores (0.9410, 0.9350) and AUPR scores (0.9391, 0.9350). Moreover, case studies of three common lncRNAs were performed for predicting their potential lncRNA target genes and the results confirmed the effectiveness of WAFNRLTG. The source codes and all data of WAFNRLTG can be freely downloaded at https://github.com/HGDYZW/WAFNRLTG.
长链非编码RNA(Long non-coding RNAs,lncRNAs)不编码蛋白质,但现已被证实参与复杂的调控功能,且其调控功能异常可引发多种人类复杂疾病。长链非编码RNA大多通过调控靶基因的表达来发挥生物学功能,准确预测潜在的长链非编码RNA靶基因,有助于进一步解析其功能注释信息。针对传统长链非编码RNA靶基因预测计算方法存在的局限性,本研究提出了一种名为用于长链非编码RNA靶基因预测的加权平均融合网络表示学习(Weighted Average Fusion Network Representation Learning for Predicting LncRNA Target Genes,WAFNRLTG)的新型计算模型。首先,本研究整合长链非编码RNA序列相似性网络、信使RNA(messenger RNA,mRNA)序列相似性网络、长链非编码RNA-mRNA相互作用网络、长链非编码RNA-微小RNA(microRNA,miRNA)相互作用网络以及mRNA-miRNA相互作用网络,构建了一个新型异质网络。随后,采用四种主流的网络表示学习方法,获取长链非编码RNA与mRNA节点的特征向量。接着,通过加权平均融合网络表示学习方法,得到该异质网络中长链非编码RNA与靶基因的特征表示。最后,将长链非编码RNA与对应靶基因的特征表示进行拼接,构建长链非编码RNA-基因对,训练极限梯度提升树(Extreme Gradient Boosting,XGBoost)分类器,进而预测潜在的长链非编码RNA靶基因。在训练集与独立测试集上开展五折交叉验证实验,结果表明WAFNRLTG取得了更优的受试者工作特征曲线下面积(Area Under Curve,AUC)得分(0.9410、0.9350)与精确召回曲线下面积(Area Under Precision-Recall Curve,AUPR)得分(0.9391、0.9350)。此外,针对三种常见长链非编码RNA开展靶基因预测的案例研究,结果验证了WAFNRLTG的有效性。WAFNRLTG的源代码与全部数据集可在https://github.com/HGDYZW/WAFNRLTG免费获取。
创建时间:
2022-01-19



