TCM-Instruction-Tuning-ShizhenGPT

Name: TCM-Instruction-Tuning-ShizhenGPT
Creator: maas
Published: 2026-05-14 09:51:00
License: 暂无描述

魔搭社区2026-05-14 更新2025-08-30 收录

下载链接：

https://modelscope.cn/datasets/FreedomIntelligence/TCM-Instruction-Tuning-ShizhenGPT

下载链接

链接失效反馈

官方服务：

资源简介：

# 📚 Introduction This dataset is a fine-tuning dataset for [ShizhenGPT](https://github.com/FreedomIntelligence/ShizhenGPT), a multimodal LLM for **Traditional Chinese Medicine (TCM)**. We open-source 245K multimodal Chinese medicine instruction data, including text instructions, visual instructions, and signal instructions for TCM. For details, see our [paper](https://arxiv.org/abs/2508.14706) and [GitHub repository](https://github.com/FreedomIntelligence/ShizhenGPT). # 📊 Dataset Overview The open-sourced fine-tuning dataset consists of three parts: | | Modality | Data Quantity | | ------------------------------------ | ------------------------------ | ------------- | | TCM Text Instructions | 📝 Text | 87K | | TCM Visual Instructions | 📝 Text, 👁️ Visual | 67K | | TCM Speech Instructions | 📝 Text, 👁️ Visual, 🎙️ Audio | 91K | > ⚠️ Note: Since TCM signal datasets, such as pulse and smell, involve private information, we recommend users download them from the corresponding paper. # 📖 Citation If you find our data useful, please consider citing our work! ``` @misc{chen2025shizhengptmultimodalllmstraditional, title={ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine}, author={Junying Chen and Zhenyang Cai and Zhiheng Liu and Yunjin Yang and Rongsheng Wang and Qingying Xiao and Xiangyi Feng and Zhan Su and Jing Guo and Xiang Wan and Guangjun Yu and Haizhou Li and Benyou Wang}, year={2025}, eprint={2508.14706}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.14706}, } ```

📚 数据集简介本数据集为[时珍GPT（ShizhenGPT）](https://github.com/FreedomIntelligence/ShizhenGPT)的微调数据集，后者是一款面向**中医药（Traditional Chinese Medicine, TCM）**的多模态大语言模型（Large Language Model, LLM）。我们开源了24.5万条中医药多模态指令数据，涵盖中医药领域的文本指令、视觉指令与信号指令。详细信息可参阅我们的[学术论文](https://arxiv.org/abs/2508.14706)与[GitHub仓库](https://github.com/FreedomIntelligence/ShizhenGPT)。 📊 数据集概览本次开源的微调数据集包含三个部分： | | 模态 | 数据量 | | ------------------------------------ | ------------------------------ | ------------- | | 中医药文本指令 | 📝 文本 | 8.7万 | | 中医药视觉指令 | 📝 文本、👁️ 视觉 | 6.7万 | | 中医药语音指令 | 📝 文本、👁️ 视觉、🎙️ 音频 | 9.1万 | ⚠️ 注意：由于脉搏、嗅味等中医药信号数据集涉及隐私信息，我们建议用户从对应学术论文中下载此类数据。 📖 引用声明若您认为本数据集对您的研究有所帮助，请引用我们的相关工作！ @misc{chen2025shizhengptmultimodalllmstraditional, title={ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine}, author={Junying Chen and Zhenyang Cai and Zhiheng Liu and Yunjin Yang and Rongsheng Wang and Qingying Xiao and Xiangyi Feng and Zhan Su and Jing Guo and Xiang Wan and Guangjun Yu and Haizhou Li and Benyou Wang}, year={2025}, eprint={2508.14706}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.14706}, }

提供机构：

maas

创建时间：

2025-08-22

搜集汇总

数据集介绍

背景与挑战

背景概述

TCM-Instruction-Tuning-ShizhenGPT是一个专注于传统中医(TCM)的多模态指令微调数据集，包含245K条涵盖文本、视觉和语音的指令数据，用于训练ShizhenGPT多模态大语言模型。数据集分为三部分：87K纯文本指令、67K文本+视觉指令和91K文本+视觉+音频指令，但涉及脉搏和气味等敏感信号的数据因隐私问题未包含在内。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集