用于中药新药研发、药理研究、中医药理论研究的全量知识图谱数据集
收藏天津市数据知识产权登记平台2025-09-09 更新2025-09-22 收录
下载链接:
https://dengji.tjippc.cn/xxgg_nr?id=d54e255d-8685-498c-9e2a-cce42285db81
下载链接
链接失效反馈官方服务:
资源简介:
1.知识抽取阶段:在中医知识图谱构建过程中,知识抽取阶段采用深度学习方法系统化处理多源异构数据。运用BiLSTM-CRF混合神经网络模型和大语言模型对《黄帝内经》《伤寒论》等2000余部中医古籍进行语义解析,20余个中医领域公开数据集精准提取"病-证-症"诊断三元组实体关系。
2.知识融合处理流程:面对中医药领域多源数据存在的术语差异和标准不统一问题,建立了严格的知识融合体系。首先依据ICD11国际标准构建中医术语映射词典,对来自古籍、方剂库和现代文献的12万余条专业术语,通过方剂,证候,中药材等相关的国标进行标准化对齐。针对不同来源方剂数据的配伍冲突,开发了基于置信度加权的冲突消解算法,建立多维度的可信度评估模型,有效解决不同典籍间方剂组成差异问题。经上述算法规则处理的数据集,能为中医药研究提供精准的中医药领域,涵盖中成药、方剂、功效、证候和症状等知识数据。
1. Knowledge Extraction Stage: During the construction of traditional Chinese medicine (TCM) knowledge graphs, the knowledge extraction stage adopts deep learning methods to systematically process multi-source heterogeneous data. Using the hybrid BiLSTM-CRF neural network model and large language models (LLMs), semantic parsing is conducted on more than 2,000 ancient TCM classics including *Huangdi Neijing* (Inner Canon of Huangdi) and *Shanghan Lun* (Treatise on Febrile Diseases), and the "disease-syndrome-symptom" diagnostic triple entity relations are accurately extracted from over 20 public TCM domain datasets.
2. Knowledge Fusion Processing Workflow: Facing the problems of terminology discrepancies and inconsistent standards in multi-source data within the TCM field, a rigorous knowledge fusion system has been established. First, a TCM terminology mapping dictionary is constructed based on the ICD-11 international standard, and more than 120,000 professional terms sourced from ancient books, prescription databases and modern literature are standardized and aligned via national standards related to prescriptions, syndromes, Chinese medicinal materials and other categories. Aiming at the compatibility conflicts of prescription data from different sources, a confidence-weighted conflict resolution algorithm is developed, and a multi-dimensional credibility evaluation model is built, which effectively resolves the differences in prescription composition across various classics. The datasets processed through the aforementioned algorithm rules can provide precise TCM domain knowledge data for TCM research, covering knowledge points such as proprietary Chinese medicines, prescriptions, efficacy, syndromes and symptoms.
提供机构:
天津天士力数智中医药科技有限公司
创建时间:
2025-09-08
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个全量知识图谱,专为中药新药研发、药理研究和中医药理论研究设计,包含超过300万条结构化数据,涵盖症状、病因和方剂等实体关系。它通过深度学习和知识融合算法处理多源中医数据,解决了术语标准化和知识碎片化问题,适用于科研机构和企业,旨在提升临床决策效率和缩短研发周期。
以上内容由遇见数据集搜集并总结生成



