sanggusti/synthvision-medical-vqa
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/sanggusti/synthvision-medical-vqa
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators: []
language:
- en
language_creators: []
license: []
multilinguality:
- monolingual
pretty_name: 'synthvision_medical_vqa'
size_categories:
- 1K<n<10K
source_datasets:
- 'extended|OpenMed/synthvision-validated-kimi-by-qwen'
tags:
- adaption
- instruction-tuning
- medical
task_categories: []
task_ids: []
---

This dataset is a remastered version of this [dataset](OpenMed/synthvision-validated-kimi-by-qwen) prepared using [Adaption's](https://adaptionlabs.ai/app/auth) Adaptive Data platform.
# synthvision_medical_vqa
This dataset contains 55,382 cross-validated medical visual question answering pairs derived from the SynthVision pipeline, featuring multi-turn conversations between humans and assistants about clinical imaging findings. Each record includes an image path, a ShareGPT-formatted conversation covering findings, mechanisms, differentials, urgency, and management, along with clinical reports and structured findings. The annotations were generated by Kimi K2.5 and rigorously validated by Qwen 3.5 to ensure consistency with source images, removing hallucinated content to achieve a 93% pass rate.
### Dataset size
There are 1,000 data points in this dataset. This is an instruction tuning dataset.
### Quality of Remastered Dataset
The final quality is A, with a relative quality improvement of 98.0%.
### Domain
- Medical (100%)
### Language
- English (100%)
### Tone
- Technical (88%)
- Professional (12%)
### Evaluation Results
- **Quality Gains:**
<img src="https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/fcdc875e-fc9c-47cd-9bfd-708a8cad0e2a.png" alt="QualityGains" style="max-width: 50%; display: block; margin-left: auto; margin-right: auto;" />
- **Grade Improvement:**
<img src="https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/23d22b61-f78b-4a70-9bca-69aaf3242fdd.png" alt="Grade" style="max-width: 50%; display: block; margin-left: auto; margin-right: auto;" />
- **Percentile Chart:**
<img src="https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/3a84e2bc-aff8-4114-9930-0af799e811e3.png" alt="Percentile Chart" style="max-width: 50%; display: block; margin-left: auto; margin-right: auto;" />
注释创建者: []
语言:
- 英语
语言创建者: []
许可证: []
多语言属性:
- 单语言
展示名称: 'synthvision_medical_vqa'
规模类别:
- 1000 < 数据规模 < 10000
源数据集:
- '扩展自|OpenMed/synthvision-validated-kimi-by-qwen'
标签:
- 适配(Adaption)
- 指令微调(instruction-tuning)
- 医疗
任务类别: []
任务ID: []
---

本数据集是[OpenMed/synthvision-validated-kimi-by-qwen]数据集的重制版,通过[Adaption(Adaption Labs)的自适应数据平台](https://adaptionlabs.ai/app/auth)制作完成。
# synthvision_medical_vqa
本数据集包含55,382条经交叉验证的医疗视觉问答(Visual Question Answering, VQA)样本对,源自SynthVision流程,涵盖围绕临床影像发现展开的人机多轮交互对话。每条数据均包含图像路径、采用ShareGPT格式的对话内容(覆盖影像发现、发病机制、鉴别诊断、紧急程度与诊疗方案),以及临床报告与结构化发现结果。注释由Kimi K2.5生成,并经Qwen 3.5严格验证,以确保与源图像保持一致,剔除AI幻觉内容,最终通过率达93%。
### 数据集规模
本数据集包含1,000条数据样本,属于指令微调(instruction-tuning)数据集。
### 重制版数据集质量
最终质量评级为A级,相对质量提升幅度达98.0%。
### 应用领域
- 医疗(100%)
### 语言
- 英语(100%)
### 文本基调
- 专业技术类(88%)
- 专业正式类(12%)
### 评估结果
- **质量提升情况:**
<img src="https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/fcdc875e-fc9c-47cd-9bfd-708a8cad0e2a.png" alt="质量提升" style="max-width: 50%; display: block; margin-left: auto; margin-right: auto;" />
- **评级提升情况:**
<img src="https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/23d22b61-f78b-4a70-9bca-69aaf3242fdd.png" alt="评级提升" style="max-width: 50%; display: block; margin-left: auto; margin-right: auto;" />
- **百分位排名图表:**
<img src="https://proteus-prod-public.s3.us-east-1.amazonaws.com/temp/3a84e2bc-aff8-4114-9930-0af799e811e3.png" alt="百分位排名图表" style="max-width: 50%; display: block; margin-left: auto; margin-right: auto;" />
提供机构:
sanggusti



