Enlarged Data Sets and Innovative Applicability Domain Characterization Empower ML Models to Reliably Bridge hERG Binding Data Gaps in Diverse Chemicals
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Enlarged_Data_Sets_and_Innovative_Applicability_Domain_Characterization_Empower_ML_Models_to_Reliably_Bridge_hERG_Binding_Data_Gaps_in_Diverse_Chemicals/29917616
下载链接
链接失效反馈官方服务:
资源简介:
Chemicals may cause cardiotoxicity
by binding to the
K+ channel encoded by the human ether-à-go-go-related gene (hERG). Given the ever-increasing number of chemicals,
developing in silico models to efficiently fill the
hERG binding affinity data gap is more desirable than conducting time-consuming
experimental tests. However, previous data sets with limited chemical
space hindered the development of models with high prediction accuracy
and broad applicability domains (ADs). Herein, an expanded hERG binding
affinity data set containing diverse categories of chemicals was constructed
and subsequently employed to develop machine learning models. ADs
of the constructed models were defined by an innovative structure–activity
landscape (SAL)-based AD characterization (ADSAL), which
considers activity cliffs within SALs formed by molecules with similar
structures but inconsistent bioactivities. The optimal model constrained
by the ADSAL achieved a coefficient of determination up
to 0.89 on the external-validation set, which significantly outperformed
previous models. The model coupled with the ADSAL constraint
was applied to predict hERG binding affinities for more than 100,000
chemicals from multiple inventories, identifying over 5,000 potential
hERG blockers. The model with ADSAL can serve as an efficient
and reliable tool for bridging the hERG-mediated cardiotoxicity data
vacancy to support sound chemical management.
创建时间:
2025-08-15



