five

TCM-SD

收藏
魔搭社区2025-12-04 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/OmniData/TCM-SD
下载链接
链接失效反馈
官方服务:
资源简介:
displayName: TCM-SD labelTypes: - English Corpus license: - CC BY-NC-SA 4.0 paperUrl: https://arxiv.org/pdf/2203.10839.pdf publishDate: "2022" publishUrl: https://github.com/Borororo/ZY-BERT publisher: - Beijing University of Technology - Xuzhou City Hospital of Traditional Chinese Medicine tags: - Traditional Chinese Medicine --- # 数据集介绍 ## 简介 中药 (TCM) 是一种天然,安全,有效的疗法,已在世界范围内传播和应用。独特的中医诊疗系统需要对隐藏在以自由文本书写的临床记录中的患者症状进行全面分析。先前的研究表明,该系统可以借助人工智能 (AI) 技术 (例如自然语言处理 (NLP)) 进行信息化和智能化。但是,现有数据集的质量和数量都不足以支持TCM中数据驱动的AI技术的进一步发展。因此,在本文中,我们将重点放在中医诊疗系统的核心任务-辨证论治 (SD) 上,并介绍了第一个针对SD的公共大规模基准,称为TCM-SD。我们的基准包含涵盖148综合征的54,152真实临床记录。此外,我们在TCM领域中收集了大规模的未标记文本语料库,并提出了一种特定于领域的预训练语言模型,称为ZYBERT。我们使用深度神经网络进行了实验,以建立强大的性能基线,揭示SD中的各种挑战,并证明了特定领域的预训练语言模型的潜力。我们的研究和分析揭示了整合计算机科学和语言学知识以探索中医理论的经验有效性的机会。 ## Download dataset :modelscope-code[]{type="git"}

displayName: TCM-SD labelTypes: - English Corpus license: - CC BY-NC-SA 4.0 paperUrl: "https://arxiv.org/pdf/2203.10839.pdf" publishDate: "2022" publishUrl: "https://github.com/Borororo/ZY-BERT" publisher: - Beijing University of Technology - Xuzhou City Hospital of Traditional Chinese Medicine tags: - Traditional Chinese Medicine --- # Dataset Introduction ## Introduction Traditional Chinese Medicine (TCM) is a natural, safe and effective therapy that has spread and been applied worldwide. The unique TCM diagnosis and treatment system requires comprehensive analysis of patient symptoms hidden in free-text clinical records. Previous studies have shown that this system can be informatized and intelligentized with the help of artificial intelligence (AI) technologies such as natural language processing (NLP). However, the quality and quantity of existing datasets are insufficient to support the further development of data-driven AI technologies in TCM. Therefore, in this paper, we focus on the core task of TCM diagnosis and treatment system—Syndrome Differentiation and Treatment (SD), and introduce the first public large-scale benchmark tailored for SD, named TCM-SD. Our benchmark contains 54,152 real clinical records covering 148 syndromes. In addition, we collected a large-scale unlabeled text corpus in the TCM field and proposed a domain-specific pre-trained language model called ZYBERT. We conducted experiments using deep neural networks to establish robust performance baselines, uncover various challenges in SD, and demonstrate the potential of domain-specific pre-trained language models. Our research and analysis reveal opportunities to integrate knowledge from computer science and linguistics to explore the empirical validity of TCM theories. ## Download Dataset :modelscope-code[]{type="git"}
提供机构:
maas
创建时间:
2024-07-01
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
TCM-SD是一个专注于传统中医辨证任务的大规模公开基准数据集,包含54,152条真实临床记录,覆盖148种证候,旨在支持人工智能技术在中医领域的应用。该数据集还附带大规模未标记中医文本语料和领域预训练模型ZYBERT,用于促进数据驱动的AI研究,许可证为CC BY-NC-SA 4.0。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作