Enzyme Substrate Prediction from Three-Dimensional Feature Representations Using Space-Filling Curves
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://figshare.com/articles/dataset/Enzyme_Substrate_Prediction_from_Three-Dimensional_Feature_Representations_Using_Space-Filling_Curves/22128880
下载链接
链接失效反馈官方服务:
资源简介:
Compact and interpretable structural feature representations
are
required for accurately predicting properties and function of proteins.
In this work, we construct and evaluate three-dimensional feature
representations of protein structures based on space-filling curves
(SFCs). We focus on the problem of enzyme substrate prediction, using
two ubiquitous enzyme families as case studies: the short-chain dehydrogenase/reductases
(SDRs) and the S-adenosylmethionine-dependent methyltransferases
(SAM-MTases). Space-filling curves such as the Hilbert curve and the
Morton curve generate a reversible mapping from discretized three-dimensional
to one-dimensional representations and thus help to encode three-dimensional
molecular structures in a system-independent way and with only a few
adjustable parameters. Using three-dimensional structures of SDRs
and SAM-MTases generated using AlphaFold2, we assess the performance
of the SFC-based feature representations in predictions on a new benchmark
database of enzyme classification tasks including their cofactor and
substrate selectivity. Gradient-boosted tree classifiers yield binary
prediction accuracy of 0.77–0.91 and area under curve (AUC)
characteristics of 0.83–0.92 for the classification tasks.
We investigate the effects of amino acid encoding, spatial orientation,
and (the few) parameters of SFC-based encodings on the accuracy of
the predictions. Our results suggest that geometry-based approaches
such as SFCs are promising for generating protein structural representations
and are complementary to the existing protein feature representations
such as evolutionary scale modeling (ESM) sequence embeddings.
创建时间:
2023-02-20



