food-VQA-benchmark
收藏魔搭社区2026-01-02 更新2025-01-11 收录
下载链接:
https://modelscope.cn/datasets/AdaptLLM/food-VQA-benchmark
下载链接
链接失效反馈官方服务:
资源简介:
# Adapting Multimodal Large Language Models to Domains via Post-Training (EMNLP 2025)
This repos contains the **food visual instruction tasks for evaluating MLLMs** in our paper: [On Domain-Specific Post-Training for Multimodal Large Language Models](https://huggingface.co/papers/2411.19930).
The main project page is: [Adapt-MLLM-to-Domains](https://huggingface.co/AdaptLLM/Adapt-MLLM-to-Domains)
## 1. Download Data
You can load datasets using the `datasets` library:
```python
from datasets import load_dataset
# Choose the task name from the list of available tasks
task_name = 'FoodSeg103' # Options: 'Food101', 'FoodSeg103', 'Nutrition5K', 'Recipe1M'
# Load the dataset for the chosen task
data = load_dataset('AdaptLLM/food-VQA-benchmark', task_name, split='test')
print(list(data)[0])
```
The mapping between category names and indices for `Food101`, `FoodSeg103`, and `Nutrition5K` datasets is provided in the following files:
<details>
<summary> Click to expand </summary>
- Food101: `food101_name_to_label_map.json`
- FoodSeg103: `foodSeg103_id2label.json`
- Nutrition5K: `nutrition5k_ingredients.py`
#### Example Usages:
**Food101**
```python
import json
# Load the mapping file
map_path = 'food101_name_to_label_map.json'
name_to_label_map = json.load(open(map_path))
name_to_label_map = {key.replace('_', ' '): value for key, value in name_to_label_map.items()}
# Reverse mapping: label to name
label_to_name_map = {value: key for key, value in name_to_label_map.items()}
```
**FoodSeg103**
```python
import json
# Load the mapping file
map_path = 'foodSeg103_id2label.json'
id2name_map = json.load(open(map_path))
# Remove background and irrelevant labels
id2name_map.pop("0") # Background
id2name_map.pop("103") # Other ingredients
# Convert keys to integers
id2name_map = {int(key): value for key, value in id2name_map.items()}
# Create reverse mapping: name to ID
name2id_map = {value: key for key, value in id2name_map.items()}
```
**Nutrition5K**
```python
from nutrition5k_ingredients import all_ingredients
# Create mappings
id2name_map = dict(zip(range(0, len(all_ingredients)), all_ingredients))
name2id_map = {value: key for key, value in id2name_map.items()}
```
</details>
## 2. Evaluate Any MLLM Compatible with vLLM on the Food Benchmarks
We provide a guide to directly evaluate MLLMs such as LLaVA-v1.6 ([open-source version](https://huggingface.co/Lin-Chen/open-llava-next-llama3-8b)), Qwen2-VL-Instruct, and Llama-3.2-Vision-Instruct.
To evaluate other MLLMs, refer to [this guide](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language.py) for modifying the `BaseTask` class in the [vllm_inference/utils/task.py](https://github.com/bigai-ai/QA-Synthesizer/blob/main/vllm_inference/utils/task.py) file.
Feel free reach out to us for assistance!
**The dataset loading script is embedded in the inference code, so you can directly run the following commands to evaluate MLLMs.**
### 1) Setup
Install vLLM using `pip` or [from source](https://vllm.readthedocs.io/en/latest/getting_started/installation.html#build-from-source).
As recommended in the official vLLM documentation, install vLLM in a **fresh new** conda environment:
```bash
conda create -n vllm python=3.10 -y
conda activate vllm
pip install vllm # Ensure vllm>=0.6.2 for compatibility with Llama-3.2. If Llama-3.2 is not used, vllm==0.6.1 is sufficient.
```
Clone the repository and navigate to the inference directory:
```bash
git clone https://github.com/bigai-ai/QA-Synthesizer.git
cd QA-Synthesizer/vllm_inference
RESULTS_DIR=./eval_results # Directory for saving evaluation scores
```
### 2) Evaluate
Run the following commands:
```bash
# Specify the domain: choose from ['food', 'Recipe1M', 'Nutrition5K', 'Food101', 'FoodSeg103']
# 'food' runs inference on all food tasks; others run on individual tasks.
DOMAIN='food'
# Specify the model type: choose from ['llava', 'qwen2_vl', 'mllama']
# For LLaVA-v1.6, Qwen2-VL, and Llama-3.2-Vision-Instruct, respectively.
MODEL_TYPE='qwen2_vl'
# Set the model repository ID on Hugging Face. Examples:
# "Qwen/Qwen2-VL-2B-Instruct", "AdaptLLM/food-Qwen2-VL-2B-Instruct" for MLLMs based on Qwen2-VL-Instruct.
# "meta-llama/Llama-3.2-11B-Vision-Instruct", "AdaptLLM/food-Llama-3.2-11B-Vision-Instruct" for MLLMs based on Llama-3.2-Vision-Instruct.
# "AdaptLLM/food-LLaVA-NeXT-Llama3-8B" for MLLMs based on LLaVA-v1.6.
MODEL=AdaptLLM/food-Qwen2-VL-2B-Instruct
# Set the directory for saving model prediction outputs:
OUTPUT_DIR=./output/AdaMLLM-food-Qwen-2B_${DOMAIN}
# Run inference with data parallelism; adjust CUDA devices as needed:
CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' bash run_inference.sh ${MODEL} ${DOMAIN} ${MODEL_TYPE} ${OUTPUT_DIR} ${RESULTS_DIR}
```
Detailed scripts to reproduce our results are in [Evaluation.md](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Evaluation.md)
### 3) Results
The evaluation results are stored in `./eval_results`, and the model prediction outputs are in `./output`.
## Citation
If you find our work helpful, please cite us.
[Adapt MLLM to Domains](https://huggingface.co/papers/2411.19930) (EMNLP 2025 Findings)
```bibtex
@article{adamllm,
title={On Domain-Adaptive Post-Training for Multimodal Large Language Models},
author={Cheng, Daixuan and Huang, Shaohan and Zhu, Ziyu and Zhang, Xintong and Zhao, Wayne Xin and Luan, Zhongzhi and Dai, Bo and Zhang, Zhenliang},
journal={arXiv preprint arXiv:2411.19930},
year={2024}
}
```
[Adapt LLM to Domains](https://huggingface.co/papers/2309.09530) (ICLR 2024)
```bibtex
@inproceedings{
cheng2024adapting,
title={Adapting Large Language Models via Reading Comprehension},
author={Daixuan Cheng and Shaohan Huang and Furu Wei},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=y886UXPEZ0}
}
```
# 基于后训练的多模态大语言模型领域适配(EMNLP 2025)
本仓库包含我们发表于论文《On Domain-Specific Post-Training for Multimodal Large Language Models》中用于评估多模态大语言模型(Multimodal Large Language Model, MLLM)的食品视觉指令任务。
项目主页:[Adapt-MLLM-to-Domains](https://huggingface.co/AdaptLLM/Adapt-MLLM-to-Domains)
## 1. 数据下载
您可通过`datasets`库加载数据集:
python
from datasets import load_dataset
# 从可用任务列表中选择任务名称
task_name = 'FoodSeg103' # 可选值:'Food101', 'FoodSeg103', 'Nutrition5K', 'Recipe1M'
# 加载所选任务的数据集
data = load_dataset('AdaptLLM/food-VQA-benchmark', task_name, split='test')
print(list(data)[0])
`Food101`、`FoodSeg103`与`Nutrition5K`数据集的类别名称与索引映射关系可通过以下文件获取:
<details>
<summary> 点击展开 </summary>
- Food101: `food101_name_to_label_map.json`
- FoodSeg103: `foodSeg103_id2label.json`
- Nutrition5K: `nutrition5k_ingredients.py`
#### 示例用法:
**Food101**
python
import json
# 加载映射文件
map_path = 'food101_name_to_label_map.json'
name_to_label_map = json.load(open(map_path))
name_to_label_map = {key.replace('_', ' '): value for key, value in name_to_label_map.items()}
# 反向映射:标签到名称
label_to_name_map = {value: key for key, value in name_to_label_map.items()}
**FoodSeg103**
python
import json
# 加载映射文件
map_path = 'foodSeg103_id2label.json'
id2name_map = json.load(open(map_path))
# 移除背景与无关标签
id2name_map.pop("0") # 背景
id2name_map.pop("103") # 其他食材
# 将键转换为整数类型
id2name_map = {int(key): value for key, value in id2name_map.items()}
# 创建反向映射:名称到ID
name2id_map = {value: key for key, value in id2name_map.items()}
**Nutrition5K**
python
from nutrition5k_ingredients import all_ingredients
# 创建映射关系
id2name_map = dict(zip(range(0, len(all_ingredients)), all_ingredients))
name2id_map = {value: key for key, value in id2name_map.items()}
</details>
## 2. 在食品基准数据集上评估兼容vLLM的任意多模态大语言模型
我们提供了直接评估多模态大语言模型的指南,支持的模型包括LLaVA-v1.6([开源版本](https://huggingface.co/Lin-Chen/open-llava-next-llama3-8b))、Qwen2-VL-Instruct以及Llama-3.2-Vision-Instruct。若需评估其他多模态大语言模型,请参考[此指南](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language.py)修改[vllm_inference/utils/task.py](https://github.com/bigai-ai/QA-Synthesizer/blob/main/vllm_inference/utils/task.py)中的`BaseTask`类。欢迎随时联系我们获取协助!
**数据集加载脚本已嵌入推理代码,您可直接运行以下命令评估多模态大语言模型。**
### 1) 环境配置
您可通过`pip`或[源码编译](https://vllm.readthedocs.io/en/latest/getting_started/installation.html#build-from-source)安装vLLM。根据vLLM官方文档的建议,请在**全新的**conda环境中安装vLLM:
bash
conda create -n vllm python=3.10 -y
conda activate vllm
pip install vllm # 若需兼容Llama-3.2,请确保vLLM版本≥0.6.2;若不使用Llama-3.2,vLLM==0.6.1即可满足需求。
克隆本仓库并进入推理目录:
bash
git clone https://github.com/bigai-ai/QA-Synthesizer.git
cd QA-Synthesizer/vllm_inference
RESULTS_DIR=./eval_results # 用于保存评估分数的目录
### 2) 模型评估
运行以下命令:
bash
# 指定评估领域:可从['food', 'Recipe1M', 'Nutrition5K', 'Food101', 'FoodSeg103']中选择
# 若设为'food'则在所有食品任务上执行推理,其余选项仅在对应单个任务上运行
DOMAIN='food'
# 指定模型类型:可从['llava', 'qwen2_vl', 'mllama']中选择
# 分别对应LLaVA-v1.6、Qwen2-VL与Llama-3.2-Vision-Instruct模型
MODEL_TYPE='qwen2_vl'
# 设置Hugging Face上的模型仓库ID,示例如下:
# "Qwen/Qwen2-VL-2B-Instruct"、"AdaptLLM/food-Qwen2-VL-2B-Instruct"(基于Qwen2-VL-Instruct的模型)
# "meta-llama/Llama-3.2-11B-Vision-Instruct"、"AdaptLLM/food-Llama-3.2-11B-Vision-Instruct"(基于Llama-3.2-Vision-Instruct的模型)
# "AdaptLLM/food-LLaVA-NeXT-Llama3-8B"(基于LLaVA-v1.6的模型)
MODEL=AdaptLLM/food-Qwen2-VL-2B-Instruct
# 设置保存模型预测输出的目录:
OUTPUT_DIR=./output/AdaMLLM-food-Qwen-2B_${DOMAIN}
# 启用数据并行化执行推理;请根据实际情况调整CUDA设备编号:
CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' bash run_inference.sh ${MODEL} ${DOMAIN} ${MODEL_TYPE} ${OUTPUT_DIR} ${RESULTS_DIR}
用于复现我们实验结果的详细脚本可参考[Evaluation.md](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Evaluation.md)
### 3) 实验结果
评估结果将保存至`./eval_results`目录,模型预测输出将保存至`./output`目录。
## 引用
若您的工作受益于本项目,请引用我们的论文。
[Adapt MLLM to Domains](https://huggingface.co/papers/2411.19930)(EMNLP 2025 发现论文)
bibtex
@article{adamllm,
title={On Domain-Adaptive Post-Training for Multimodal Large Language Models},
author={Cheng, Daixuan and Huang, Shaohan and Zhu, Ziyu and Zhang, Xintong and Zhao, Wayne Xin and Luan, Zhongzhi and Dai, Bo and Zhang, Zhenliang},
journal={arXiv preprint arXiv:2411.19930},
year={2024}
}
[Adapt LLM to Domains](https://huggingface.co/papers/2309.09530)(ICLR 2024)
bibtex
@inproceedings{
cheng2024adapting,
title={Adapting Large Language Models via Reading Comprehension},
author={Daixuan Cheng and Shaohan Huang and Furu Wei},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=y886UXPEZ0}
}
提供机构:
maas
创建时间:
2025-01-08
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是一个用于评估多模态大语言模型在食品领域性能的视觉问答基准,包含Food101、FoodSeg103等多个食品相关子任务的数据。它提供了数据加载脚本和模型评估指南,支持用户对模型进行领域适应性测试。
以上内容由遇见数据集搜集并总结生成



