Supplementary Data: Uncovering DFG-out sequence propensity determinants of kinases with machine learning
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/10251162
下载链接
链接失效反馈官方服务:
资源简介:
General description
This submission accompanies the paper "Uncovering DFG-out sequence propensity determinants of kinases with machine learning" and covers:
Trained models (train/models_latest). A subset of these is also published as a part of the KinActive tool; each model is of `KinactiveClassifier` type and can be loaded using this tool as well via `kinactive.io.load()` function).
Patched sequences (patched_seqs). PDB sequences, with missing regions patched by UniProt sequences. This is a collection of lXtractor ChainSequence objects.
All labels, variables, and datasets (datasets and labels).
Initial chains and predictions for SwissProt proteins (SP_predictions).
Note on model abbreviations
The main text emphasized datasets that yielded more interpretable results. These were constructed from domain sequences, labeled as apo, inactive, DFG-in, or DFG-out, and further divided into TK and STK subsets. We refer to these datasets with the abbreviation AAIO (Apo All In or Out) to distinguish them from additional datasets.
In addition to AAIO, we explored two alternative labeling strategies:
AHAO (Apo/Holo Any Out): This dataset includes sequences from ligand-bound entries. All sequences within 95\% identity clusters are labeled as DFG-out if the cluster contains at least one sequence in this state. All others are labeled as DFG-in.
AAO (Apo Any Out): This dataset excludes sequences corresponding to ligand-bound entries but includes those with conflicting conformational tendencies within 95\% identity clusters.
Each of these datasets had two versions:
A seed version, denoted by a "*" symbol.
A version enriched with orthologous sequences (no special designation).
Together with the TkST datasets (encompassing TK and STK labels) used for testing the methodology, a total of 14 datasets were used, and both RF and XGB models were applied to each, using the same initial settings, resulting in 28 different models. This additional information is provided for completeness.
创建时间:
2023-12-02



