Guidelines for Recurrent Neural Network Transfer Learning-Based Molecular Generation of Focused Libraries
收藏NIAID Data Ecosystem2026-03-11 收录
下载链接:
https://figshare.com/articles/dataset/Guidelines_for_Recurrent_Neural_Network_Transfer_Learning-Based_Molecular_Generation_of_Focused_Libraries/12709722
下载链接
链接失效反馈官方服务:
资源简介:
Deep learning approaches
have become popular in recent years in
the field of de novo molecular design. While a variety
of different methods are available, it is still a challenge to assess
and compare their performance. A particularly promising approach for
automated drug design is to use recurrent neural networks (RNNs) as
SMILES generators and train them with the learning procedure called
“transfer learning”. This involves first training the
initial model on a large generic data set of molecules to learn the
general syntax of SMILES, followed by fine-tuning on a smaller set
of molecules, coming from, e.g., a lead optimization program. To create
a well-performing transfer learning application which can be automated,
it is important to understand how the size of the second data set
affects the training process. In addition, extensive postfiltering
using similarity metrics of the molecules generated after transfer
learning should be avoided, as it can introduce new biases toward
the selection of drug candidates. Here, we present results from the
application of a gated recurrent unit cell (GRU)-RNN to transfer learning
on data sets of varying sizes and complexity. Analysis of the results
has allowed us to provide some general guidelines for transfer learning.
In particular, we show that data set sizes containing at least 190
molecules are needed for effective GRU-RNN-based molecular generation
using transfer learning. The methods presented here should be applicable
generally to the benchmarking of other deep learning methodologies
for molecule generation.
创建时间:
2020-07-13



