ESPDHot: An Effective Machine Learning-Based Approach for Predicting Protein–DNA Interaction Hotspots
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/ESPDHot_An_Effective_Machine_Learning-Based_Approach_for_Predicting_Protein_DNA_Interaction_Hotspots/25564316
下载链接
链接失效反馈官方服务:
资源简介:
Protein–DNA interactions are pivotal to various
cellular
processes. Precise identification of the hotspot residues for protein–DNA
interactions holds great significance for revealing the intricate
mechanisms in protein–DNA recognition and for providing essential
guidance for protein engineering. Aiming at protein–DNA interaction
hotspots, this work introduces an effective prediction method, ESPDHot
based on a stacked ensemble machine learning framework. Here, the
interface residue whose mutation leads to a binding free energy change
(ΔΔG) exceeding 2 kcal/mol is defined
as a hotspot. To tackle the imbalanced data set issue, the adaptive
synthetic sampling (ADASYN), an oversampling technique, is adopted
to synthetically generate new minority samples, thereby rectifying
data imbalance. As for molecular characteristics, besides traditional
features, we introduce three new characteristic types including residue
interface preference proposed by us, residue fluctuation dynamics
characteristics, and coevolutionary features. Combining the Boruta
method with our previously developed Random Grouping strategy, we
obtained an optimal set of features. Finally, a stacking classifier
is constructed to output prediction results, which integrates three
classical predictors, Support Vector Machine (SVM), XGBoost, and Artificial
Neural Network (ANN) as the first layer, and Logistic Regression (LR)
algorithm as the second one. Notably, ESPDHot outperforms the current
state-of-the-art predictors, achieving superior performance on the
independent test data set, with F1, MCC, and AUC reaching 0.571, 0.516,
and 0.870, respectively.
创建时间:
2024-04-08



