PubMedVision
收藏魔搭社区2026-05-16 更新2025-01-25 收录
下载链接:
https://modelscope.cn/datasets/FreedomIntelligence/PubMedVision
下载链接
链接失效反馈官方服务:
资源简介:
## News
- [2025/02/18]: We add the original captions of PubMedVision in `PubMedVision_Original_Caption.json`, as well as the Chinese version of PubMedVision in `PubMedVision_Chinese.json`.
- [2024/07/01]: We add annotations for 'body_part' and 'modality' of images, utilizing the [HuatuoGPT-Vision-7B](https://huggingface.co/FreedomIntelligence/HuatuoGPT-Vision-7B) model.
## PubMedVision
PubMedVision is a large-scale medical VQA dataset. We extracted high-quality image-text pairs from PubMed and used GPT-4V to reformat them to enhance their quality.
PubMedVision significantly improves the multimodal capabilities of MLLMs in the medical field. For more details, refer to our [paper](https://arxiv.org/abs/2406.19280) and [github](https://github.com/FreedomIntelligence/HuatuoGPT-Vision).
## Data Volume
PubMedVision contains 1.3 million medical VQAs, divided into Alignment VQA and Instruction Tuning VQA:
| Data | # Data |
| ---------- | ---------- |
| PubMedVision_Alignment_VQA | 647,031 |
| PubMedVision_InstructionTuning_VQA | 647,031 |
| **Total** | **1,294,062** |
## Image Data
`images_*.zip` contains the compressed image data. You can unzip these images using the following code:
```bash
for ((i=0; i<20; i++))
do
unzip -j images_$i.zip -d images/ & # wait patiently, it takes a while...
done
```
## Citation
If you find our data useful, please consider citing our work! We are FreedomIntelligence from [Shenzhen Research Institute of Big Data](http://sribd.cn/en) and [The Chinese University of Hong Kong, Shenzhen](https://sds.cuhk.edu.cn/en)
```
@misc{chen2024huatuogptvisioninjectingmedicalvisual,
title={HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale},
author={Junying Chen and Ruyi Ouyang and Anningzhe Gao and Shunian Chen and Guiming Hardy Chen and Xidong Wang and Ruifei Zhang and Zhenyang Cai and Ke Ji and Guangjun Yu and Xiang Wan and Benyou Wang},
year={2024},
eprint={2406.19280},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2406.19280},
}
```
## 更新动态
- [2025/02/18]:我们在`PubMedVision_Original_Caption.json`中新增了PubMedVision的原始字幕文件,同时在`PubMedVision_Chinese.json`中提供了该数据集的中文版本。
- [2024/07/01]:我们借助[HuatuoGPT-Vision-7B](https://huggingface.co/FreedomIntelligence/HuatuoGPT-Vision-7B)模型,为图像的「身体部位」与「模态类型」添加了标注。
## PubMedVision数据集
PubMedVision是一款大规模医疗视觉问答(Visual Question Answering, VQA)数据集。我们从PubMed中提取了高质量的图文对,并通过GPT-4V对其进行格式重构以提升数据质量。
该数据集可有效提升多模态大语言模型(Multimodal Large Language Models, MLLMs)在医疗领域的多模态能力。如需了解更多细节,请参考我们的[论文](https://arxiv.org/abs/2406.19280)与[GitHub仓库](https://github.com/FreedomIntelligence/HuatuoGPT-Vision)。
## 数据规模
PubMedVision共计包含130万条医疗视觉问答数据,分为对齐式视觉问答(Alignment VQA)与指令微调式视觉问答(Instruction Tuning VQA)两类:
| 数据集名称 | 数据量 |
| ------------------------------ | ---------- |
| PubMedVision_Alignment_VQA | 647,031 |
| PubMedVision_InstructionTuning_VQA | 647,031 |
| **总计** | **1,294,062** |
## 图像数据
`images_*.zip` 为压缩后的图像数据文件。您可通过以下代码完成解压:
bash
for ((i=0; i<20; i++))
do
unzip -j images_$i.zip -d images/ & # wait patiently, it takes a while...
done
## 引用说明
若您认为本数据集对您的研究有所帮助,请引用我们的相关工作!本项目由来自[深圳大数据研究院](http://sribd.cn/en)与[香港中文大学(深圳)数据科学学院](https://sds.cuhk.edu.cn/en)的FreedomIntelligence团队打造。
@misc{chen2024huatuogptvisioninjectingmedicalvisual,
title={HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale},
author={Junying Chen and Ruyi Ouyang and Anningzhe Gao and Shunian Chen and Guiming Hardy Chen and Xidong Wang and Ruifei Zhang and Zhenyang Cai and Ke Ji and Guangjun Yu and Xiang Wan and Benyou Wang},
year={2024},
eprint={2406.19280},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2406.19280},
}
提供机构:
maas
创建时间:
2025-01-20
搜集汇总
数据集介绍

背景与挑战
背景概述
PubMedVision是一个大规模医学视觉问答(VQA)数据集,包含约130万条高质量图像-文本对,这些数据从PubMed提取并经GPT-4V格式化处理,旨在增强多模态大语言模型在医学领域的视觉理解能力。数据集分为对齐VQA和指令调优VQA两部分,各占约64.7万条,并提供了图像的身体部位和模态注释,以及中文版本,支持医学多模态研究的广泛应用。
以上内容由遇见数据集搜集并总结生成



