Automated Generation of Novel Fragments Using Screening Data, a Dual SMILES Autoencoder, Transfer Learning and Syntax Correction
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/Automated_Generation_of_Novel_Fragments_Using_Screening_Data_a_Dual_SMILES_Autoencoder_Transfer_Learning_and_Syntax_Correction/14669522
下载链接
链接失效反馈官方服务:
资源简介:
Fragment-based hit identification
(FBHI) allows proportionately
greater coverage of chemical space using fewer molecules than traditional
high-throughput screening approaches. However, effectively exploiting
this advantage is highly dependent on the library design. Solubility,
stability, chemical complexity, chemical/shape diversity, and synthetic
tractability for fragment elaboration are all critical aspects, and
molecule design remains a time-consuming task for computational and
medicinal chemists. Artificial neural networks have attracted considerable
attention in automated de novo design applications
and could also prove useful for fragment library design. Chemical
autoencoders are neural networks consisting of encoder and decoder
parts, which respectively compress and decompress molecular representations.
The decoder is applied to samples drawn from the space of compressed
representations to generate novel molecules that can be scored for
properties of interest. Here, we report an autoencoder model using
a recurrent neural network architecture, which was trained using 486,565
fragments curated from commercial sources, to simultaneously reconstruct
both SMILES and chemical fingerprints. To explore its utility in fragment
design, we applied transfer learning to the fingerprint decoder layers
to train a classifier using 66 frequent hitter fragments identified
from our screening campaigns. Using a particle swarm optimization
sampling approach, we compare the performance of this “dual”
model to an architecture encoding SMILES only. The dual model produced
valid SMILES with improved features, considering a range of properties
including aromatic ring counts, heavy atom count, synthetic accessibility,
and a new fragment complexity score we term Feature Complexity (FeCo).
Additionally, we demonstrate that generative performance is further
enhanced by use of a simple syntax-correction procedure during training,
in which invalid and undesirable SMILES are spiked into the training
set. Finally, we used the syntax-corrected model to generate a library
of novel candidate privileged fragments.
创建时间:
2021-05-24



