Distributed Representation of Chemical Fragments
收藏NIAID Data Ecosystem2026-03-10 收录
下载链接:
https://figshare.com/articles/dataset/Distributed_Representation_of_Chemical_Fragments/5965126
下载链接
链接失效反馈官方服务:
资源简介:
This
article describes an unsupervised machine learning method
for computing distributed vector representation of molecular fragments.
These vectors encode fragment features in a continuous high-dimensional
space and enable similarity computation between individual fragments,
even for small fragments with only two heavy atoms. The method is
based on a word embedding algorithm borrowed from natural language
processing field, and approximately 6 million unlabeled PubChem chemicals
were used for training. The resulting dense fragment vectors are in
contrast to the traditional sparse “one-hot” fragment
representation and capture rich relational structure in the fragment
space. The vectors of small linear fragments were averaged to yield
distributed vectors of bigger fragments and molecules, which were
used for different tasks, e.g., clustering, ligand recall, and quantitative
structure–activity relationship modeling. The distributed vectors
were found to be better at clustering ring systems and recall of kinase
ligands as compared to standard binary fingerprints. This work demonstrates
unsupervised learning of fragment chemistry from large sets of unlabeled
chemical structures and subsequent application to supervised training
on relatively small data sets of labeled chemicals.
创建时间:
2018-03-08



