biomed-visual-instructions
收藏魔搭社区2025-12-10 更新2025-01-11 收录
下载链接:
https://modelscope.cn/datasets/AdaptLLM/biomed-visual-instructions
下载链接
链接失效反馈官方服务:
资源简介:
# Adapting Multimodal Large Language Models to Domains via Post-Training (EMNLP 2025)
This repos contains the **biomedicine visual instructions for post-training MLLMs** in our paper: [On Domain-Specific Post-Training for Multimodal Large Language Models](https://huggingface.co/papers/2411.19930).
The main project page is: [Adapt-MLLM-to-Domains](https://huggingface.co/AdaptLLM/Adapt-MLLM-to-Domains)
## Data Information
Using our [visual instruction synthesizer](https://huggingface.co/AdaptLLM/visual-instruction-synthesizer), we generate visual instruction tasks based on the image-caption pairs from [PubMedVision](https://huggingface.co/datasets/FreedomIntelligence/PubMedVision) (referred to as `PMC_refined` in our paper). These synthetic tasks, combined with the original image captioning tasks, are used to train general MLLMs through a single-stage post-training process.
- **image_caption_pairs.json**: Derived from [PubMedVision_Alignment_VQA](https://huggingface.co/datasets/FreedomIntelligence/PubMedVision/blob/main/PubMedVision_Alignment_VQA.json) in PubMedVision, we include only single-image examples, resulting in 500K image-caption pairs.
- **synthetic_visual_tasks.json**: Synthesized based on the aforementioned image-caption pairs, containing 144K synthetic instruction-response pairs after applying our consistency-based data filter.
- **image_caption_and_synthetic_task.json**: This dataset is used to reproduce our `single-stage domain-specific post-training`, containing both image-captioning tasks and synthetic visual-instruction tasks, totaling 500K examples (equal to the number of image-caption pairs).
## To Download the Data
1. Set up dependencies:
```bash
pip install "huggingface_hub[cli]"
```
2. Download text data:
```bash
REPO="AdaptLLM/biomed-visual-instructions"
# The local directory where you intend to save the files
LOCAL_DIR="./biomed-visual-instructions"
# Choose from ["image_caption_and_synthetic_task.json", "image_caption_pairs.json", "synthetic_visual_tasks.json"]
FILE="image_caption_and_synthetic_task.json" # This is used for reproducing AdaMLLM in our paper.
huggingface-cli download --resume-download ${REPO} ${FILE} --local-dir ${LOCAL_DIR} --repo-type dataset
```
3. Download image data:
```bash
REPO="FreedomIntelligence/PubMedVision"
huggingface-cli download --resume-download ${REPO} --local-dir ${LOCAL_DIR} --repo-type dataset --include "images_*.zip"
```
4. Unzip the downloaded images:
```bash
cd ${LOCAL_DIR}
for ((i=0; i<20; i++))
do
unzip -j images_$i.zip -d images/ & # Wait patiently, it takes a while...
done
```
## To reproduce the data
We have included detailed scripts to reproduce the data in [Synthesis.md](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Synthesis.md).
## To train MLLMs using the data
Our training data can be easily used to train MLLMs based on the `Llava` repository or the `LLaMA Factory` repository. Please refer to the [Post-Train Guide](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Post_Train.md).
## Citation
If you find our work helpful, please cite us.
[Adapt MLLM to Domains](https://huggingface.co/papers/2411.19930) (EMNLP 2025 Findings)
```bibtex
@article{adamllm,
title={On Domain-Adaptive Post-Training for Multimodal Large Language Models},
author={Cheng, Daixuan and Huang, Shaohan and Zhu, Ziyu and Zhang, Xintong and Zhao, Wayne Xin and Luan, Zhongzhi and Dai, Bo and Zhang, Zhenliang},
journal={arXiv preprint arXiv:2411.19930},
year={2024}
}
```
[Adapt LLM to Domains](https://huggingface.co/papers/2309.09530) (ICLR 2024)
```bibtex
@inproceedings{
adaptllm,
title={Adapting Large Language Models via Reading Comprehension},
author={Daixuan Cheng and Shaohan Huang and Furu Wei},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=y886UXPEZ0}
}
```
# 基于后训练的多模态大语言模型领域适配(EMNLP 2025)
本仓库包含我们的论文《On Domain-Specific Post-Training for Multimodal Large Language Models》(https://huggingface.co/papers/2411.19930)中用于多模态大语言模型(Multimodal Large Language Model, MLLM)后训练的生物医学视觉指令集。
项目主页为:[Adapt-MLLM-to-Domains](https://huggingface.co/AdaptLLM/Adapt-MLLM-to-Domains)
## 数据集详情
我们借助[视觉指令合成器(visual instruction synthesizer)](https://huggingface.co/AdaptLLM/visual-instruction-synthesizer),基于来自[PubMedVision](https://huggingface.co/datasets/FreedomIntelligence/PubMedVision)的图像-文本对(本文中记为`PMC_refined`)生成视觉指令任务。将这些合成任务与原始图像字幕任务相结合,即可通过单阶段后训练流程训练通用多模态大语言模型。
- **image_caption_pairs.json**:源自PubMedVision数据集内的[PubMedVision_Alignment_VQA](https://huggingface.co/datasets/FreedomIntelligence/PubMedVision/blob/main/PubMedVision_Alignment_VQA.json),仅保留单图像样本,最终得到50万条图像-文本对。
- **synthetic_visual_tasks.json**:基于前述图像-文本对合成所得,经基于一致性的数据过滤后,共包含14.4万条合成指令-回复对。
- **image_caption_and_synthetic_task.json**:该数据集用于复现我们的`单阶段特定领域后训练`流程,同时包含图像字幕任务与合成视觉指令任务,总计50万条样本(与图像-文本对的数量一致)。
## 数据下载
1. 配置依赖环境:
bash
pip install "huggingface_hub[cli]"
2. 下载文本数据:
bash
REPO="AdaptLLM/biomed-visual-instructions"
# 用于存储文件的本地目录
LOCAL_DIR="./biomed-visual-instructions"
# 可选文件列表:["image_caption_and_synthetic_task.json", "image_caption_pairs.json", "synthetic_visual_tasks.json"]
FILE="image_caption_and_synthetic_task.json" # 本文件用于复现本文中的AdaMLLM模型。
huggingface-cli download --resume-download ${REPO} ${FILE} --local-dir ${LOCAL_DIR} --repo-type dataset
3. 下载图像数据:
bash
REPO="FreedomIntelligence/PubMedVision"
huggingface-cli download --resume-download ${REPO} --local-dir ${LOCAL_DIR} --repo-type dataset --include "images_*.zip"
4. 解压下载的图像文件:
bash
cd ${LOCAL_DIR}
for ((i=0; i<20; i++))
do
unzip -j images_$i.zip -d images/ &# 请耐心等待,解压过程将耗时片刻
done
## 数据复现
我们已在[Synthesis.md](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Synthesis.md)中提供了复现该数据集的详细脚本。
## 基于该数据集训练多模态大语言模型
我们的训练数据可直接用于基于`Llava`仓库或`LLaMA Factory`仓库的多模态大语言模型训练,请参考[后训练指南(Post-Train Guide)](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Post_Train.md)。
## 引用
若您的工作受益于本数据集,请引用我们的论文。
[Adapt MLLM to Domains](https://huggingface.co/papers/2411.19930)(EMNLP 2025 发现论文)
bibtex
@article{adamllm,
title={On Domain-Adaptive Post-Training for Multimodal Large Language Models},
author={Cheng, Daixuan and Huang, Shaohan and Zhu, Ziyu and Zhang, Xintong and Zhao, Wayne Xin and Luan, Zhongzhi and Dai, Bo and Zhang, Zhenliang},
journal={arXiv preprint arXiv:2411.19930},
year={2024}
}
[Adapt LLM to Domains](https://huggingface.co/papers/2309.09530)(ICLR 2024)
bibtex
@inproceedings{
adaptllm,
title={Adapting Large Language Models via Reading Comprehension},
author={Daixuan Cheng and Shaohan Huang and Furu Wei},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=y886UXPEZ0}
}
提供机构:
maas
创建时间:
2025-01-08



