five

Gene Function Prediction from Functional Association Networks Using Kernel Partial Least Squares Regression

收藏
NIAID Data Ecosystem2026-03-08 收录
下载链接:
https://figshare.com/articles/dataset/_Gene_Function_Prediction_from_Functional_Association_Networks_Using_Kernel_Partial_Least_Squares_Regression_/1514383
下载链接
链接失效反馈
官方服务:
资源简介:
With the growing availability of large-scale biological datasets, automated methods of extracting functionally meaningful information from this data are becoming increasingly important. Data relating to functional association between genes or proteins, such as co-expression or functional association, is often represented in terms of gene or protein networks. Several methods of predicting gene function from these networks have been proposed. However, evaluating the relative performance of these algorithms may not be trivial: concerns have been raised over biases in different benchmarking methods and datasets, particularly relating to non-independence of functional association data and test data. In this paper we propose a new network-based gene function prediction algorithm using a commute-time kernel and partial least squares regression (Compass). We compare Compass to GeneMANIA, a leading network-based prediction algorithm, using a number of different benchmarks, and find that Compass outperforms GeneMANIA on these benchmarks. We also explicitly explore problems associated with the non-independence of functional association data and test data. We find that a benchmark based on the Gene Ontology database, which, directly or indirectly, incorporates information from other databases, may considerably overestimate the performance of algorithms exploiting functional association data for prediction.

随着大规模生物数据集的可用性日益提升,从这类数据中提取具有功能意义信息的自动化方法正变得愈发重要。有关基因或蛋白质间功能关联的数据(如共表达或功能关联信息)通常以基因或蛋白质网络的形式进行表征。目前已有多种基于此类网络的基因功能预测方法被提出。然而,评估这些算法的相对性能并非易事:已有研究指出不同基准测试方法与数据集存在偏倚问题,尤其是在功能关联数据与测试数据的非独立性方面。本文提出了一种基于通勤时间核(commute-time kernel)与偏最小二乘回归的新型网络型基因功能预测算法Compass。我们采用多种不同基准测试集,将Compass与主流网络型预测算法GeneMANIA进行对比,结果显示Compass在这些基准测试中的表现优于GeneMANIA。我们还专门探讨了功能关联数据与测试数据非独立性相关的问题。研究发现,基于基因本体(Gene Ontology)数据库的基准测试(该数据库直接或间接整合了其他数据库的信息)可能会大幅高估利用功能关联数据开展预测的算法的性能。
创建时间:
2016-01-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作