Can Pretrained Models Really Learn Better Molecular Representations for AI-Aided Drug Discovery?
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://figshare.com/articles/dataset/Can_Pretrained_Models_Really_Learn_Better_Molecular_Representations_for_AI-Aided_Drug_Discovery_/24902555
下载链接
链接失效反馈官方服务:
资源简介:
Self-supervised pretrained
models are gaining increasingly
more
popularity in AI-aided drug discovery, leading to more and more pretrained
models with the promise that they can extract better feature representations
for molecules. Yet, the quality of learned representations has not
been fully explored. In this work, inspired by the two phenomena of
Activity Cliffs (ACs) and Scaffold Hopping (SH) in traditional Quantitative
Structure–Activity Relationship analysis, we propose a method
named Representation-Property Relationship Analysis (RePRA) to evaluate the quality
of the representations extracted by the pretrained model and visualize
the relationship between the representations and properties. The concepts
of ACs and SH are generalized from the structure–activity context
to the representation-property context, and the underlying principles
of RePRA are analyzed theoretically. Two scores are designed to measure
the generalized ACs and SH detected by RePRA, and therefore, the quality
of representations can be evaluated. In experiments, representations
of molecules from 10 target tasks generated by 7 pretrained models
are analyzed. The results indicate that the state-of-the-art pretrained
models can overcome some shortcomings of canonical Extended-Connectivity
FingerPrints, while the correlation between the basis of the representation
space and specific molecular substructures are not explicit. Thus,
some representations could be even worse than the canonical fingerprints.
Our method enables researchers to evaluate the quality of molecular
representations generated by their proposed self-supervised pretrained
models. And our findings can guide the community to develop better
pretraining techniques to regularize the occurrence of ACs and SH.
创建时间:
2023-12-25



