Supporting data for "ViralBindPredict: Empowering Viral Protein-Ligand Binding Sites through Deep Learning and Protein Sequence-Derived Insights"

Name: Supporting data for "ViralBindPredict: Empowering Viral Protein-Ligand Binding Sites through Deep Learning and Protein Sequence-Derived Insights"
Creator: GigaScience Database
Published: 2026-01-23 06:30:01
License: 暂无描述

DataCite Commons2026-01-23 更新2026-05-03 收录

下载链接：

https://gigadb.org/dataset/102798/

下载链接

链接失效反馈

官方服务：

资源简介：

The ViralBindPredict dataset was created to provide a unified, large-scale resource of viral protein-ligand interactions suitable for machine learning and reproducible benchmarking. Structural entries were collected from the Protein Data Bank and processed to retain only viral protein chains with biologically relevant ligands. Modified residues were mapped to canonical amino acids, ligands were filtered for chemical relevance and standardized using SMILES and InChI, and residue-level interaction labels were assigned using a 4.5 A heavy-atom distance cutoff. <br> The dataset includes curated viral protein sequences, chain-specific structural files, canonical ligand identifiers, validated molecular descriptors, sequence embeddings, and residue-level interaction annotations. Multiple dataset versions are provided, including a full set and a non-redundant set clustered at 90% sequence identity. Leakage-controlled split strategies (Cluster90%, Cluster40%, NoRed-Cluster90%, NoRed-Cluster40%) separate proteins and ligands into training, validation, testing, blind protein, blind ligand, and fully blind subsets to support rigorous generalization studies. <br> This dataset enables the development and evaluation of computational methods for predicting viral ligand-binding residues, modeling viral drug-target interactions, benchmarking sequence-based models, and supporting antiviral discovery workflows.

提供机构：

GigaScience Database

创建时间：

2026-01-23