Table4_A Novel Collaborative Filtering Model-Based Method for Identifying Essential Proteins.XLSX

NIAID Data Ecosystem2026-03-13 收录

下载链接：

https://figshare.com/articles/dataset/Table4_A_Novel_Collaborative_Filtering_Model-Based_Method_for_Identifying_Essential_Proteins_XLSX/16841455

下载链接

链接失效反馈

官方服务：

资源简介：

Considering that traditional biological experiments are expensive and time consuming, it is important to develop effective computational models to infer potential essential proteins. In this manuscript, a novel collaborative filtering model-based method called CFMM was proposed, in which, an updated protein–domain interaction (PDI) network was constructed first by applying collaborative filtering algorithm on the original PDI network, and then, through integrating topological features of PDI networks with biological features of proteins, a calculative method was designed to infer potential essential proteins based on an improved PageRank algorithm. The novelties of CFMM lie in construction of an updated PDI network, application of the commodity-customer-based collaborative filtering algorithm, and introduction of the calculation method based on an improved PageRank algorithm, which ensured that CFMM can be applied to predict essential proteins without relying entirely on known protein–domain associations. Simulation results showed that CFMM can achieve reliable prediction accuracies of 92.16, 83.14, 71.37, 63.87, 55.84, and 52.43% in the top 1, 5, 10, 15, 20, and 25% predicted candidate key proteins based on the DIP database, which are remarkably higher than 14 competitive state-of-the-art predictive models as a whole, and in addition, CFMM can achieve satisfactory predictive performances based on different databases with various evaluation measurements, which further indicated that CFMM may be a useful tool for the identification of essential proteins in the future.

鉴于传统生物学实验成本高昂且耗时良久，开发高效计算模型以推断潜在必需蛋白质（essential proteins）具有重要研究价值。本文提出一种新型基于协同过滤模型的方法CFMM：首先基于原始蛋白质-结构域相互作用（protein–domain interaction, PDI）网络，通过协同过滤算法（collaborative filtering algorithm）构建更新后的PDI网络；随后，将PDI网络的拓扑特征与蛋白质的生物学特征相结合，设计了一种基于改进型PageRank算法（PageRank algorithm）的计算方法，用于推断潜在必需蛋白质。CFMM的创新之处在于：构建更新后的PDI网络、采用基于商品-客户的协同过滤算法，以及引入基于改进型PageRank算法的计算方法，这使得CFMM无需完全依赖已知的蛋白质-结构域关联数据，即可用于必需蛋白质的预测。基于DIP数据库的仿真实验结果显示，在按预测得分排序的前1%、5%、10%、15%、20%及25%的候选关键蛋白质中，CFMM的预测准确率分别可达92.16%、83.14%、71.37%、63.87%、55.84%及52.43%，整体性能显著优于14种具有竞争力的当前顶尖预测模型。此外，CFMM在不同数据库及多种评价指标下均能取得令人满意的预测性能，这进一步表明，CFMM未来可作为识别必需蛋白质的有效工具。

创建时间：

2021-10-21

5,000+

优质数据集

54 个

任务类型

进入经典数据集