Improving Measures of Chemical Structural Similarity Using Machine Learning on Chemical–Genetic Interactions
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://figshare.com/articles/dataset/Improving_Measures_of_Chemical_Structural_Similarity_Using_Machine_Learning_on_Chemical_Genetic_Interactions/15066614
下载链接
链接失效反馈官方服务:
资源简介:
A common strategy
for identifying molecules likely to possess a
desired biological activity is to search large databases of compounds
for high structural similarity to a query molecule that demonstrates
this activity, under the assumption that structural similarity is
predictive of similar biological activity. However, efforts to systematically
benchmark the diverse array of available molecular fingerprints and
similarity coefficients have been limited by a lack of large-scale
datasets that reflect biological similarities of compounds. To elucidate
the relative performance of these alternatives, we systematically
benchmarked 11 different molecular fingerprint encodings, each combined
with 13 different similarity coefficients, using a large set of chemical–genetic
interaction data from the yeast Saccharomyces cerevisiae as a systematic proxy for biological activity. We found that the
performance of different molecular fingerprints and similarity coefficients
varied substantially and that the all-shortest path fingerprints paired
with the Braun-Blanquet similarity coefficient provided superior performance
that was robust across several compound collections. We further proposed
a machine learning pipeline based on support vector machines that
offered a fivefold improvement relative to the best unsupervised approach.
Our results generally suggest that using high-dimensional chemical–genetic
data as a basis for refining molecular fingerprints can be a powerful
approach for improving prediction of biological functions from chemical
structures.
创建时间:
2021-07-28



