CMLM/ZhongJing-OMNI

hugging_face2024-10-17 更新2025-04-19 收录

下载链接：

https://hf-mirror.com/datasets/CMLM/ZhongJing-OMNI

下载链接

链接失效反馈

资源简介：

--- license: mit task_categories: - question-answering - text-generation tags: - medical --- # ZhongJing-OMNI: The First Multimodal Benchmark for Evaluating Traditional Chinese Medicine **ZhongJing-OMNI** is the first multimodal benchmark dataset designed to evaluate Traditional Chinese Medicine (TCM) knowledge in large language models. This dataset provides a diverse array of questions and multimodal data, combining visual and textual information to assess the model’s ability to reason through complex TCM diagnostic and therapeutic scenarios. The unique combination of TCM textual knowledge with multimodal tongue diagnosis data sets a new standard for AI research in TCM. ### Key Multimodal Features: - **Multiple-choice questions**: Encompassing core TCM concepts, syndromes, diagnostics, and herbal formulas. - **Open-ended questions**: Focused on detailed diagnostic reasoning, treatment strategies, and explanation of TCM principles. - **Case-based questions**: Real-world clinical cases that require in-depth analysis and comprehensive treatment approaches. - **Multimodal tongue diagnosis Q&A**: High-resolution tongue images paired with corresponding diagnostic questions and expert answers, combining both visual and textual data to evaluate the model’s understanding of TCM tongue diagnosis. This multimodal dataset allows AI systems to develop a deeper, more holistic understanding of TCM by integrating textual reasoning with visual diagnostic skills, making it a powerful resource for healthcare AI research. ## Dataset Structure - `MCQ/`: Multiple-choice questions with answer keys. - `OpenQA/`: Open-ended questions with detailed expert-verified answers. - `CaseQA/`: Clinical case-based questions and answers. - `TongueDiagnosis/`: High-quality tongue diagnosis images with paired Q&A for multimodal analysis. ## How to Use ### 1. Clone the repository: ```bash git clone https://github.com/yourusername/ZhongJing-OMNI.git #2. Load the dataset: ``` ```python import pandas as pd # Load multiple-choice data mcq_data = pd.read_csv('MCQ/questions.csv') # Load open-ended Q&A openqa_data = pd.read_csv('OpenQA/questions.csv') # Load case-based Q&A caseqa_data = pd.read_csv('CaseQA/questions.csv') # Load tongue diagnosis Q&A (multimodal data) tongue_data = pd.read_csv('TongueDiagnosis/tongue_questions.csv') ``` 3. Multimodal Tongue Diagnosis Example: ```python from PIL import Image # Load and display an example tongue image for multimodal evaluation img = Image.open('TongueDiagnosis/images/tongue001.png') img.show() # Load the corresponding Q&A with open('TongueDiagnosis/questions/tongue001_question.txt', 'r') as file: question = file.read() print(f"Question: {question}") with open('TongueDiagnosis/answers/tongue001_answer.txt', 'r') as file: answer = file.read() print(f"Answer: {answer}") ``` ## Why Multimodal? The ZhongJing-OMNI dataset introduces the first multimodal component for TCM, combining visual and textual data, which is crucial for understanding complex diagnostic features such as tongue color, shape, and coating. This allows models to: - **Learn how to integrate visual diagnostic features with textual knowledge. - **Perform joint reasoning over both modalities to reach accurate TCM diagnoses. - **Support real-world clinical applications where visual and textual data are intertwined. # Tongue Diagnosis Example: Qi Deficiency with Pale Tongue ![Qi Deficiency Pale Tongue](demo.png) This image shows a pale, slightly swollen tongue with a thin white coating. These features are typical signs of Qi deficiency in Traditional Chinese Medicine. This example represents an actual test result from our dataset using the Claude-3.5-Sonnet model. It demonstrates the model's capability to accurately identify and describe key features of tongue images used in Traditional Chinese Medicine diagnosis. ## Contact For questions or collaboration, please contact at Email: ylkan21@m.fudan.edu.cn Citation If you use ZhongJing-OMNI in your research or project, please cite it as follows: ``` @dataset{zhongjing_omni_2024, title = {ZhongJing-OMNI: The First Multimodal Benchmark for Evaluating Traditional Chinese Medicine}, author = {Kang, Yanlan}, year = {2024}, publisher = {GitHub}, journal = {GitHub repository}, url = {https://github.com/yourusername/ZhongJing-OMNI} } ```

提供机构：

CMLM

用户留言

有没有相关的论文或文献参考？

这个数据集是基于什么背景创建的？

数据集的作者是谁？

能帮我联系到这个数据集的作者吗？

这个数据集如何下载？

点击留言

数据主题

具身智能

数据集 4098个

机构 8个

大模型

数据集 439个

机构 10个

无人机

数据集 37个

机构 6个

指令微调

数据集 36个

机构 6个

蛋白质结构

数据集 50个

机构 8个

空间智能

数据集 21个

机构 5个

5,000+

优质数据集

54 个

任务类型

进入经典数据集

热门数据集

Yahoo Finance

Dataset About finance related to stock market

kaggle 收录

OMIM (Online Mendelian Inheritance in Man)

OMIM是一个包含人类基因和遗传疾病信息的在线数据库。它提供了详细的遗传疾病描述、基因定位、相关文献和临床信息。数据集内容包括疾病名称、基因名称、基因定位、遗传模式、临床特征、相关文献引用等。

www.omim.org 收录

TongueDx Dataset

TongueDx数据集是一个专为远程舌诊研究设计的综合性舌象图像数据集，由香港理工大学和新加坡管理大学的研究团队创建。该数据集包含5109张图像，涵盖了多种环境条件下的舌象，图像通过智能手机和笔记本电脑摄像头采集，具有较高的多样性和代表性。数据集不仅包含舌象图像，还提供了详细的舌面属性标注，如舌色、舌苔厚度等，并附有受试者的年龄、性别等人口统计信息。数据集的创建过程包括图像采集、舌象分割、标准化处理和多标签标注，旨在解决远程医疗中舌诊图像质量不一致的问题。该数据集的应用领域主要集中在远程医疗和中医诊断，旨在通过自动化技术提高舌诊的准确性和可靠性。

arXiv 收录

LibriSpeech

LibriSpeech 是一个大约 1000 小时的 16kHz 英语朗读语音语料库，由 Vassil Panayotov 在 Daniel Povey 的协助下编写。数据来自 LibriVox 项目的已读有声读物，并经过仔细分割和对齐。

OpenDataLab 收录

YOLO Drone Detection Dataset

为了促进无人机检测模型的开发和评估，我们引入了一个新颖且全面的数据集，专门为训练和测试无人机检测算法而设计。该数据集来源于Kaggle上的公开数据集，包含在各种环境和摄像机视角下捕获的多样化的带注释图像。数据集包括无人机实例以及其他常见对象，以实现强大的检测和分类。

github 收录