five

TCM-Instruction-Tuning-ShizhenGPT

收藏
魔搭社区2026-05-14 更新2025-08-30 收录
下载链接:
https://modelscope.cn/datasets/FreedomIntelligence/TCM-Instruction-Tuning-ShizhenGPT
下载链接
链接失效反馈
官方服务:
资源简介:
# <span>📚 Introduction</span> This dataset is a fine-tuning dataset for [ShizhenGPT](https://github.com/FreedomIntelligence/ShizhenGPT), a multimodal LLM for **Traditional Chinese Medicine (TCM)**. We open-source 245K multimodal Chinese medicine instruction data, including text instructions, visual instructions, and signal instructions for TCM. For details, see our [paper](https://arxiv.org/abs/2508.14706) and [GitHub repository](https://github.com/FreedomIntelligence/ShizhenGPT). # <span>📊 Dataset Overview</span> The open-sourced fine-tuning dataset consists of three parts: | | Modality | Data Quantity | | ------------------------------------ | ------------------------------ | ------------- | | TCM Text Instructions | 📝 Text | 87K | | TCM Visual Instructions | 📝 Text, 👁️ Visual | 67K | | TCM Speech Instructions | 📝 Text, 👁️ Visual, 🎙️ Audio | 91K | > ⚠️ Note: Since TCM signal datasets, such as pulse and smell, involve private information, we recommend users download them from the corresponding paper. # <span>📖 Citation</span> If you find our data useful, please consider citing our work! ``` @misc{chen2025shizhengptmultimodalllmstraditional, title={ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine}, author={Junying Chen and Zhenyang Cai and Zhiheng Liu and Yunjin Yang and Rongsheng Wang and Qingying Xiao and Xiangyi Feng and Zhan Su and Jing Guo and Xiang Wan and Guangjun Yu and Haizhou Li and Benyou Wang}, year={2025}, eprint={2508.14706}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.14706}, } ```

📚 数据集简介 本数据集为[时珍GPT(ShizhenGPT)](https://github.com/FreedomIntelligence/ShizhenGPT)的微调数据集,后者是一款面向**中医药(Traditional Chinese Medicine, TCM)**的多模态大语言模型(Large Language Model, LLM)。我们开源了24.5万条中医药多模态指令数据,涵盖中医药领域的文本指令、视觉指令与信号指令。 详细信息可参阅我们的[学术论文](https://arxiv.org/abs/2508.14706)与[GitHub仓库](https://github.com/FreedomIntelligence/ShizhenGPT)。 📊 数据集概览 本次开源的微调数据集包含三个部分: | | 模态 | 数据量 | | ------------------------------------ | ------------------------------ | ------------- | | 中医药文本指令 | 📝 文本 | 8.7万 | | 中医药视觉指令 | 📝 文本、👁️ 视觉 | 6.7万 | | 中医药语音指令 | 📝 文本、👁️ 视觉、🎙️ 音频 | 9.1万 | ⚠️ 注意:由于脉搏、嗅味等中医药信号数据集涉及隐私信息,我们建议用户从对应学术论文中下载此类数据。 📖 引用声明 若您认为本数据集对您的研究有所帮助,请引用我们的相关工作! @misc{chen2025shizhengptmultimodalllmstraditional, title={ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine}, author={Junying Chen and Zhenyang Cai and Zhiheng Liu and Yunjin Yang and Rongsheng Wang and Qingying Xiao and Xiangyi Feng and Zhan Su and Jing Guo and Xiang Wan and Guangjun Yu and Haizhou Li and Benyou Wang}, year={2025}, eprint={2508.14706}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2508.14706}, }
提供机构:
maas
创建时间:
2025-08-22
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
TCM-Instruction-Tuning-ShizhenGPT是一个专注于传统中医(TCM)的多模态指令微调数据集,包含245K条涵盖文本、视觉和语音的指令数据,用于训练ShizhenGPT多模态大语言模型。数据集分为三部分:87K纯文本指令、67K文本+视觉指令和91K文本+视觉+音频指令,但涉及脉搏和气味等敏感信号的数据因隐私问题未包含在内。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作