Replication Data for: Persistent spectral based ensemble learning (PerSpect-EL) for protein-protein binding affinity prediction
收藏DataCite Commons2025-06-03 更新2025-04-16 收录
下载链接:
https://researchdata.ntu.edu.sg/citation?persistentId=doi:10.21979/N9/MEDJN1
下载链接
链接失效反馈官方服务:
资源简介:
Protein–protein interactions (PPIs) play a significant role in nearly all cellular and biological activities. Data-driven machine learning models have demonstrated great power in PPIs. However, the design of efficient molecular featurization poses a great challenge for all learning models for PPIs. Here, we propose persistent spectral (PerSpect) based PPI representation and featurization, and PerSpect-based ensemble learning (PerSpect-EL) models for PPI binding affinity prediction, for the first time. In our model, a sequence of Hodge (or combinatorial) Laplacian (HL) matrices at various different scales are generated from a specially designed filtration process. PerSpect attributes, which are statistical and combinatorial properties of spectrum information from these HL matrices, are used as features for PPI characterization. Each PerSpect attribute is input into a 1D convolutional neural network (CNN), and these CNN networks are stacked together in our PerSpect-based ensemble learning models. We systematically test our model on the two most commonly used datasets, i.e. SKEMPI and AB-Bind. It has been found that our model can achieve state-of-the-art results and outperform all existing models to the best of our knowledge.
蛋白质-蛋白质相互作用(Protein–protein interactions, PPIs)在几乎所有细胞和生物活动中都发挥着重要作用。数据驱动的机器学习模型已在PPIs研究中展现出强大能力。然而,高效分子特征化的设计对所有PPIs学习模型来说都是一项巨大挑战。在此,我们首次提出基于持久谱(persistent spectral, PerSpect)的PPIs表示与特征化方法,以及用于PPIs结合亲和力预测的PerSpect基集成学习(PerSpect-based ensemble learning, PerSpect-EL)模型。在我们的模型中,通过特殊设计的过滤过程生成一系列不同尺度的霍奇(或组合)拉普拉斯(Hodge (or combinatorial) Laplacian, HL)矩阵。PerSpect属性是这些HL矩阵谱信息的统计和组合特性,被用作PPIs表征的特征。每个PerSpect属性被输入到一维卷积神经网络(1D convolutional neural network, CNN)中,这些CNN网络在我们的PerSpect基集成学习模型中堆叠在一起。我们在两个最常用的数据集(即SKEMPI和AB-Bind)上对模型进行了系统测试。结果表明,在我们所知范围内,该模型可取得最先进的结果,并优于所有现有模型。
提供机构:
DR-NTU (Data)
创建时间:
2023-06-12



