用于中药研发、药理研究及药物分子设计中药成分特性与功效的数据集
收藏天津市数据知识产权登记平台2025-06-26 更新2025-07-09 收录
下载链接:
https://dengji.tjippc.cn/xxgg_nr?id=4642162b-8598-44b9-a302-b6e343aa16f2
下载链接
链接失效反馈官方服务:
资源简介:
1.数据采集:50114个中药成分数据来源于公开数据库 LTM-TCM。45个蛋白基因数据则取自公开 PPDbind 数据库。在数据采集过程中,严格遵循相关数据库的使用规定,确保数据获取的合法性与合规性。
2.中药成分分子属性:计算分子量(Molecular Weight,MW)、分子极化率(Molecular Refractivity,MR)等物化性质参数,以及成药性(QED)、合成可及性分数(SAscore)等药物化学参数,从多个维度对中药成分的性质进行量化评估。
3.天然产物功效预测算法:采用自主研发的天然产物大模型对中药成分进行分子表征,基于中药成分的化学结构和已有的中药功效数据,构建中药功效预测模型进行中药成分未知功效预测,并获得50114个中药成分的潜在功效。
4.分子对接:借助Autodock分子对接算法,将具有高质量三维结构的45个GPCR蛋白与5万多个中药成分进行对接,计算得到的 Vina Score分数,用于评估中药成分与特定蛋白或基因的结合能力,以此判断中药成分可能的作用靶点和潜在的生物活性。
经上述算法规则处理的数据集,能为中医药研究提供精准的中药成分数据,辅助探索作用机制与优化方剂;为临床诊疗提供科学的用药参考,助力制定精准治疗方案。
1. Data Collection: 50,114 traditional Chinese medicine (TCM) ingredient data were sourced from the public database LTM-TCM, while 45 protein and gene data were obtained from the public PPDbind database. During the data collection process, relevant database usage regulations were strictly followed to ensure the legality and compliance of data acquisition.
2. Molecular Properties of TCM Ingredients: Physicochemical property parameters including molecular weight (MW), molecular refractivity (MR), as well as medicinal chemistry parameters such as quantitative estimate of drug-likeness (QED) and synthetic accessibility score (SAscore) were calculated to quantitatively evaluate the properties of TCM ingredients from multiple dimensions.
3. Natural Product Efficacy Prediction Algorithm: A self-developed natural product large language model (LLM) was adopted to perform molecular representation of TCM ingredients. Based on the chemical structures of TCM ingredients and existing TCM efficacy data, a TCM efficacy prediction model was constructed to predict the unknown efficacy of TCM ingredients, and the potential efficacy of all 50,114 TCM ingredients was obtained.
4. Molecular Docking: Using the Autodock molecular docking algorithm, docking was conducted between 45 GPCR proteins with high-quality three-dimensional structures and over 50,000 TCM ingredients. The calculated Vina Score was used to evaluate the binding ability between TCM ingredients and specific proteins or genes, so as to determine the potential targets and biological activities of TCM ingredients.
The dataset processed in accordance with the above algorithmic rules can provide accurate TCM ingredient data for TCM research, assisting in exploring mechanisms of action and optimizing prescriptions; it can also provide scientific medication references for clinical diagnosis and treatment, and facilitate the development of precise treatment plans.
提供机构:
天津天士力数智中医药科技有限公司
创建时间:
2025-06-24
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集专注于中药成分特性与功效,包含50114条中药成分数据,每年更新,通过算法计算分子属性、预测功效并进行分子对接,为中药研发提供量化支持。它适用于中药新药筛选、药理机制研究、药物分子设计等场景,帮助科研机构和企业提高研发效率、降低风险,并推动中药现代化发展。
以上内容由遇见数据集搜集并总结生成



