datasheet2_A Deep Learning-Based Approach for Identifying the Medicinal Uses of Plant-Derived Natural Compounds.csv

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://figshare.com/articles/dataset/datasheet2_A_Deep_Learning-Based_Approach_for_Identifying_the_Medicinal_Uses_of_Plant-Derived_Natural_Compounds_csv/13582418

下载链接

链接失效反馈

官方服务：

资源简介：

Medicinal plants and their extracts have been used as important sources for drug discovery. In particular, plant-derived natural compounds, including phytochemicals, antioxidants, vitamins, and minerals, are gaining attention as they promote health and prevent disease. Although several in vitro methods have been developed to confirm the biological activities of natural compounds, there is still considerable room to reduce time and cost. To overcome these limitations, several in silico methods have been proposed for conducting large-scale analysis, but they are still limited in terms of dealing with incomplete and heterogeneous natural compound data. Here, we propose a deep learning-based approach to identify the medicinal uses of natural compounds by exploiting massive and heterogeneous drug and natural compound data. The rationale behind this approach is that deep learning can effectively utilize heterogeneous features to alleviate incomplete information. Based on latent knowledge, molecular interactions, and chemical property features, we generated 686 dimensional features for 4,507 natural compounds and 2,882 approved and investigational drugs. The deep learning model was trained using the generated features and verified drug indication information. When the features of natural compounds were applied as input to the trained model, potential efficacies were successfully predicted with high accuracy, sensitivity, and specificity.

药用植物及其提取物始终是药物发现的重要来源。具体而言，植物来源的天然化合物，包括植物化学物、抗氧化剂、维生素与矿物质，因其兼具促进健康与预防疾病的功效而日益受到学界关注。尽管已开发出多种体外实验方法用于验证天然化合物的生物活性，但在缩减实验时间与成本方面仍存在较大优化空间。为突破这些局限，已有多项基于计算机模拟（in silico）的方法被提出以开展大规模分析，但这类方法在处理不完整且异质性的天然化合物数据时仍存在诸多局限。本文提出一种基于深度学习的方法，借助大规模异质性药物与天然化合物数据，实现天然化合物药用用途的识别。该方法的核心逻辑在于，深度学习可有效利用异质性特征以缓解信息不完整带来的问题。基于潜在知识、分子相互作用与化学性质特征，我们为4507种天然化合物以及2882种已获批与在研药物生成了686维特征向量。研究团队利用生成的特征与已验证的药物适应症信息对深度学习模型进行训练。将天然化合物的特征输入至训练完成的模型后，模型可成功以高准确度、高灵敏度与高特异性预测其潜在药效。

创建时间：

2021-01-15

5,000+

优质数据集

54 个

任务类型

进入经典数据集