Tissue-spEcific mrNa iSoform functIOnal Networks (TENSION) Predictions
收藏DataCite Commons2020-08-28 更新2025-04-09 收录
下载链接:
https://iastate.figshare.com/articles/Tissue-spEcific_mrNa_iSoform_functIOnal_Networks_TENSION_Predictions/7218866/2
下载链接
链接失效反馈官方服务:
资源简介:
This folder contains the input and predictions of the random forest model used to develop the Tissue-spEcific mrNa iSoform functIOnal Networks (TENSION). The README file describes the contents of the files.<br>Alternative Splicing produces multiple mRNA isoforms of a gene which have important diverse roles such as regulation of gene expression, human heritable diseases, and response to environmental stresses. However, very little has been done to assign functions at the mRNA isoform level. Functional networks, where the interactions are quantified by their probability of being involved in the same biological process are typically generated at the gene level. We use a diverse array of tissue-specific RNA-seq datasets and sequence information to train random forest models for predicting the functional networks following a leave-one-tissue-out strategy. Since there is no mRNA isoform-level gold standard, we use single isoform genes co-annotated to Gene Ontology biological process annotations, Kyoto Encyclopedia of Genes and Genomes pathways, BioCyc pathways and protein-protein interactions as functionally related (positive pair). To generate the non-functional pairs (negative pair), we use the Gene Ontology annotations tagged with “NOT” qualifier. We describe 17 Tissue-spEcific mrNa iSoform functIOnal Networks (TENSION) in addition to an organism level reference functional network for mouse. We validate our predictions by comparing its performance with previous methods, randomized positive and negative class labels, updated Gene Ontology annotations, and by literature evidence.<br><b>Version 2:</b> improvements were made to the framework resulting in better performance and new datasets.
本文件夹包含用于构建组织特异性mRNA异构体功能网络(Tissue-spEcific mrNa iSoform functIOnal Networks,缩写TENSION)的随机森林模型的输入数据与预测结果。README文件将对各文件的内容进行说明。<br>可变剪接(Alternative Splicing)可生成同一基因的多种mRNA异构体,这些异构体发挥着诸多重要且多样的功能,例如调控基因表达、参与人类遗传性疾病进程以及响应环境胁迫。然而,目前针对mRNA异构体层面的功能注释工作仍相对匮乏。传统功能网络通常以基因为单位构建,其交互关系通过参与同一生物学过程的概率进行量化。本研究采用多类组织特异性RNA测序(RNA-seq)数据集与序列信息,通过留一组织(leave-one-tissue-out)策略训练随机森林模型,以预测mRNA异构体层面的功能网络。由于尚无mRNA异构体层面的金标准数据集,我们将仅携带单一异构体、且共同注释至基因本体(Gene Ontology)生物学过程条目、京都基因与基因组百科全书(Kyoto Encyclopedia of Genes and Genomes,KEGG)通路、BioCyc通路以及蛋白质-蛋白质相互作用(protein-protein interactions, PPI)的基因对,定义为功能相关的正样本对。对于非功能对(负样本对),我们通过标记有"NOT"限定符的基因本体注释来构建。本研究共构建17个组织特异性mRNA异构体功能网络(TENSION),同时还提供了小鼠的全生物体参考功能网络。我们通过与既往研究方法对比、打乱正负样本标签、使用更新后的基因本体注释以及借助文献证据,对本模型的预测性能进行了验证。<br><b>版本2:</b>本研究对算法框架进行了优化升级,模型性能得到提升并新增了数据集。
提供机构:
Iowa State University
创建时间:
2019-01-09



