five

food-visual-instructions

收藏
魔搭社区2025-12-04 更新2025-01-11 收录
下载链接:
https://modelscope.cn/datasets/AdaptLLM/food-visual-instructions
下载链接
链接失效反馈
官方服务:
资源简介:
# Adapting Multimodal Large Language Models to Domains via Post-Training (EMNLP 2025) This repos contains the **food visual instructions for post-training MLLMs** in our paper: [On Domain-Specific Post-Training for Multimodal Large Language Models](https://huggingface.co/papers/2411.19930). The main project page is: [Adapt-MLLM-to-Domains](https://huggingface.co/AdaptLLM/Adapt-MLLM-to-Domains) ## Data Information Using our [visual instruction synthesizer](https://huggingface.co/AdaptLLM/visual-instruction-synthesizer), we generate visual instruction tasks based on the image-caption pairs from [extended Recipe1M+ dataset](https://www.kaggle.com/datasets/saldenisov/recipenlg/data). These synthetic tasks, combined with the original image captioning tasks, are used to train general MLLMs through a single-stage post-training process. - **image_caption_pairs.json**: Derived from extended Recipe1M+ dataset, we include only single-image examples, resulting in 131K image-caption pairs. - **synthetic_visual_tasks.json**: Synthesized based on the aforementioned image-caption pairs, containing 39K synthetic instruction-response pairs after applying our consistency-based data filter. - **image_caption_and_synthetic_task.json**: This dataset is used to reproduce our `single-stage domain-specific post-training`, containing both image-captioning tasks and synthetic visual-instruction tasks, totaling 130K examples (equal to the number of image-caption pairs). ## To Download the Data 1. Set up dependencies: ```bash pip install "huggingface_hub[cli]" ``` 2. Download text data: ```bash REPO="AdaptLLM/food-visual-instructions" # The local directory where you intend to save the files LOCAL_DIR="./food-visual-instructions" # Choose from ["image_caption_and_synthetic_task.json", "image_caption_pairs.json", "synthetic_visual_tasks.json"] FILE="image_caption_and_synthetic_task.json" # This is used for reproducing AdaMLLM in our paper. huggingface-cli download --resume-download ${REPO} ${FILE} --local-dir ${LOCAL_DIR} --repo-type dataset ``` 3. Download image data: ```bash REPO="AdaptLLM/food-visual-instructions" huggingface-cli download --resume-download ${REPO} --local-dir ${LOCAL_DIR} --repo-type dataset --include "images_*.zip" ``` 4. Unzip the downloaded images: ```bash cd ${LOCAL_DIR} for ((i=0; i<10; i++)) do unzip -j images_$i.zip -d images/ & # Wait patiently, it takes a while... done ``` ## To reproduce the data We have included detailed scripts to reproduce the data in [Synthesis.md](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Synthesis.md). ## To train MLLMs using the data Our training data can be easily used to train MLLMs based on the `Llava` repository or the `LLaMA Factory` repository. Please refer to the [Post-Train Guide](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Post_Train.md). ## Citation If you find our work helpful, please cite us. [Adapt MLLM to Domains](https://huggingface.co/papers/2411.19930) (EMNLP 2025 Findings) ```bibtex @article{adamllm, title={On Domain-Adaptive Post-Training for Multimodal Large Language Models}, author={Cheng, Daixuan and Huang, Shaohan and Zhu, Ziyu and Zhang, Xintong and Zhao, Wayne Xin and Luan, Zhongzhi and Dai, Bo and Zhang, Zhenliang}, journal={arXiv preprint arXiv:2411.19930}, year={2024} } ``` [Adapt LLM to Domains](https://huggingface.co/papers/2309.09530) (ICLR 2024) ```bibtex @inproceedings{ adaptllm, title={Adapting Large Language Models via Reading Comprehension}, author={Daixuan Cheng and Shaohan Huang and Furu Wei}, booktitle={The Twelfth International Conference on Learning Representations}, year={2024}, url={https://openreview.net/forum?id=y886UXPEZ0} } ```

# 基于后训练的多模态大语言模型(Multimodal Large Language Models)领域适配(EMNLP 2025) 本仓库包含我们发表于论文《On Domain-Specific Post-Training for Multimodal Large Language Models》(链接:https://huggingface.co/papers/2411.19930)中用于多模态大语言模型后训练的**食品视觉指令集**。 本项目的主页面为:[Adapt-MLLM-to-Domains](https://huggingface.co/AdaptLLM/Adapt-MLLM-to-Domains) ## 数据集说明 我们借助[视觉指令合成器(visual instruction synthesizer)](https://huggingface.co/AdaptLLM/visual-instruction-synthesizer),基于[扩展版Recipe1M+数据集(extended Recipe1M+ dataset)](https://www.kaggle.com/datasets/saldenisov/recipenlg/data)中的图像-标题对生成视觉指令任务。将这些合成任务与原始图像字幕生成任务相结合,即可通过单阶段后训练流程训练通用多模态大语言模型。 - **image_caption_pairs.json**:该文件源自扩展版Recipe1M+数据集,仅保留单图像样本,最终包含13.1万组图像-标题对。 - **synthetic_visual_tasks.json**:基于前述图像-标题对合成得到,经过基于一致性的数据过滤后,共包含3.9万条合成指令-回复对。 - **image_caption_and_synthetic_task.json**:该数据集用于复现我们的`单阶段领域专属后训练`流程,同时包含图像字幕生成任务与合成视觉指令任务,总计13万条样本(与图像-标题对的数量一致)。 ## 数据下载流程 1. 安装依赖环境: bash pip install "huggingface_hub[cli]" 2. 下载文本数据: bash REPO="AdaptLLM/food-visual-instructions" # 用于存储文件的本地目录 LOCAL_DIR="./food-visual-instructions" # 可选文件列表:["image_caption_and_synthetic_task.json", "image_caption_pairs.json", "synthetic_visual_tasks.json"] FILE="image_caption_and_synthetic_task.json" # 该文件用于复现本文中的AdaMLLM模型。 huggingface-cli download --resume-download ${REPO} ${FILE} --local-dir ${LOCAL_DIR} --repo-type dataset 3. 下载图像数据: bash REPO="AdaptLLM/food-visual-instructions" huggingface-cli download --resume-download ${REPO} --local-dir ${LOCAL_DIR} --repo-type dataset --include "images_*.zip" 4. 解压下载的图像文件: bash cd ${LOCAL_DIR} for ((i=0; i<10; i++)) do unzip -j images_$i.zip -d images/ & # 请耐心等待,该过程耗时较久... done ## 数据复现 我们在[Synthesis.md](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Synthesis.md)中提供了复现该数据集的详细脚本。 ## 模型训练指南 本训练数据集可直接用于基于`Llava`仓库或`LLaMA Factory`仓库的多模态大语言模型训练,具体操作请参考[后训练指南(Post-Train Guide)](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Post_Train.md)。 ## 引用说明 若您的工作受益于本项目,请引用我们的论文。 [Adapt MLLM to Domains](https://huggingface.co/papers/2411.19930)(EMNLP 2025 发现论文) bibtex @article{adamllm, title={On Domain-Adaptive Post-Training for Multimodal Large Language Models}, author={Cheng, Daixuan and Huang, Shaohan and Zhu, Ziyu and Zhang, Xintong and Zhao, Wayne Xin and Luan, Zhongzhi and Dai, Bo and Zhang, Zhenliang}, journal={arXiv preprint arXiv:2411.19930}, year={2024} } [Adapt LLM to Domains](https://huggingface.co/papers/2309.09530)(ICLR 2024) bibtex @inproceedings{ adaptllm, title={Adapting Large Language Models via Reading Comprehension}, author={Daixuan Cheng and Shaohan Huang and Furu Wei}, booktitle={The Twelfth International Conference on Learning Representations}, year={2024}, url={https://openreview.net/forum?id=y886UXPEZ0} }
提供机构:
maas
创建时间:
2025-01-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作