Data for "TopEC: Improved classification of enzyme function by a localized 3D protein descriptor and 3D Graph Neural Networks"

DataCite Commons2024-12-03 更新2025-04-16 收录

下载链接：

https://researchdata.hhu.de/handle/entry/176

下载链接

链接失效反馈

官方服务：

资源简介：

Accurately annotating molecular function of enzymes remains challenging. Computational methods can aid in this and allow for high-throughput annotation. Tools available for inferring enzyme function from general sequence, fold, or evolutionary information are generally successful. However, they can lead to misclassification if for certain sequences a deviation in local structural features influences the function. Here, we present TopEC, a 3D graph neural network based on a localized 3D descriptor to learn chemical reactions of enzymes from (predicted) enzyme structures and predict Enzyme Commission (EC) classes. Using the message passing frameworks from SchNet and DimeNet++, we include distance and angle information to improve the predictive performance compared to regular 2D graph neural networks. We obtained significantly improved EC classification prediction (F-score: 0.72) to 2D GNNs, without fold bias at residue and atomic resolutions and trained networks that can classify both experimental and computationally generated enzyme structures for a vast functional space (> 800 ECs). Our model is robust to uncertainties in binding site locations and similar functions in distinct binding sites. By investigating the importance of each graph node to the predictive performance, we see that TopEC networks learn from an interplay between biochemical features and local shape-dependent features. TopEC is available as a repository, including accompanying data, on github: https://github.com/IBG4-CBCLab/TopEC. The data in this repository is available under the CC-BY-NC-SA 4.0 license.

准确注释酶的分子功能仍然具有挑战性。计算方法可为此提供帮助，并支持高通量注释。从通用序列、折叠或进化信息推断酶功能的现有工具通常是成功的。然而，若特定序列的局部结构特征偏差影响其功能，这些工具可能导致错误分类。在此，我们提出TopEC——一种基于局部三维描述符的三维图神经网络，可从（预测的）酶结构中学习酶的化学反应并预测酶委员会（Enzyme Commission，EC）分类。借助SchNet和DimeNet++的消息传递框架，我们纳入距离和角度信息，与常规二维图神经网络相比，提升了预测性能。我们实现了EC分类预测的显著提升（F1分数：0.72），优于二维图神经网络；在残基和原子分辨率下无折叠偏差，且训练后的网络可对实验测定及计算生成的酶结构进行分类，覆盖超过800个EC类的广阔功能空间。我们的模型对结合位点位置的不确定性以及不同结合位点中的相似功能具有鲁棒性。通过探究每个图节点对预测性能的重要性，我们发现TopEC网络从生化特征与局部形状依赖特征的相互作用中学习。TopEC作为包含配套数据的代码仓库，可在GitHub获取：https://github.com/IBG4-CBCLab/TopEC。该仓库中的数据遵循CC-BY-NC-SA 4.0许可协议。

提供机构：

N/A

创建时间：

2024-12-03

5,000+

优质数据集

54 个

任务类型

进入经典数据集