five

remote-sensing-VQA-benchmark

收藏
魔搭社区2025-12-04 更新2025-04-26 收录
下载链接:
https://modelscope.cn/datasets/AdaptLLM/remote-sensing-VQA-benchmark
下载链接
链接失效反馈
官方服务:
资源简介:
# Adapting Multimodal Large Language Models to Domains via Post-Training (EMNLP 2025) This repos contains the **remote sensing visual instruction tasks for evaluating MLLMs** in our paper: [On Domain-Specific Post-Training for Multimodal Large Language Models](https://huggingface.co/papers/2411.19930). The main project page is: [Adapt-MLLM-to-Domains](https://huggingface.co/AdaptLLM/Adapt-MLLM-to-Domains) ## 1. Download Data You can load datasets using the `datasets` library: ```python from datasets import load_dataset # Choose the task name from the list of available tasks task_name = 'CLRS' # Options: 'CLRS', 'UC_Merced', 'FloodNet', 'NWPU-Captions' # Load the dataset for the chosen task data = load_dataset('AdaptLLM/remote-sensing-VQA-benchmark', task_name, split='test') print(list(data)[0]) ``` The mapping between category names and indices for 'CLRS', 'UC_Merced' is: ```python3 # CLRS label_to_name_map = {'0': 'agricultural', '1': 'airplane', '2': 'baseball diamond', '3': 'beach', '4': 'buildings', '5': 'chaparral', '6': 'dense residential', '7': 'forest', '8': 'freeway', '9': 'golf course', '10': 'harbor', '11': 'intersection', '12': 'medium residential', '13': 'mobile home park', '14': 'overpass', '15': 'parking lot', '16': 'river', '17': 'runway', '18': 'sparse residential', '19': 'storage tanks', '20': 'tennis court'} # UC_Merced label_to_name_map = {'0': 'agricultural', '1': 'airplane', '2': 'baseball diamond', '3': 'beach', '4': 'buildings', '5': 'chaparral', '6': 'dense residential', '7': 'forest', '8': 'freeway', '9': 'golf course', '10': 'harbor', '11': 'intersection', '12': 'medium residential', '13': 'mobile home park', '14': 'overpass', '15': 'parking lot', '16': 'river', '17': 'runway', '18': 'sparse residential', '19': 'storage tanks', '20': 'tennis court'} ``` ## 2. Evaluate Any MLLM Compatible with vLLM on the Food Benchmarks We provide a guide to directly evaluate MLLMs such as LLaVA-v1.6 ([open-source version](https://huggingface.co/Lin-Chen/open-llava-next-llama3-8b)), Qwen2-VL-Instruct, and Llama-3.2-Vision-Instruct. To evaluate other MLLMs, refer to [this guide](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language.py) for modifying the `BaseTask` class in the [vllm_inference/utils/task.py](https://github.com/bigai-ai/QA-Synthesizer/blob/main/vllm_inference/utils/task.py) file. Feel free reach out to us for assistance! **The dataset loading script is embedded in the inference code, so you can directly run the following commands to evaluate MLLMs.** ### 1) Setup Install vLLM using `pip` or [from source](https://vllm.readthedocs.io/en/latest/getting_started/installation.html#build-from-source). As recommended in the official vLLM documentation, install vLLM in a **fresh new** conda environment: ```bash conda create -n vllm python=3.10 -y conda activate vllm pip install vllm # Ensure vllm>=0.6.2 for compatibility with Llama-3.2. If Llama-3.2 is not used, vllm==0.6.1 is sufficient. ``` Clone the repository and navigate to the inference directory: ```bash git clone https://github.com/bigai-ai/QA-Synthesizer.git cd QA-Synthesizer/vllm_inference RESULTS_DIR=./eval_results # Directory for saving evaluation scores ``` ### 2) Evaluate Run the following commands: ```bash # Specify the domain: choose from ['remote-sensing', 'CLRS', 'UC_Merced', 'FloodNet', 'NWPU-Captions'] # 'remote-sensing' runs inference on all food tasks; others run on individual tasks. DOMAIN='remote-sensing' # Specify the model type: choose from ['llava', 'qwen2_vl', 'mllama'] # For LLaVA-v1.6, Qwen2-VL, and Llama-3.2-Vision-Instruct, respectively. MODEL_TYPE='qwen2_vl' # Set the model repository ID on Hugging Face. Examples: # "Qwen/Qwen2-VL-2B-Instruct", "AdaptLLM/remote-sensing-Qwen2-VL-2B-Instruct" for MLLMs based on Qwen2-VL-Instruct. # "meta-llama/Llama-3.2-11B-Vision-Instruct", "AdaptLLM/remote-sensing-Llama-3.2-11B-Vision-Instruct" for MLLMs based on Llama-3.2-Vision-Instruct. # "AdaptLLM/remote-sensing-LLaVA-NeXT-Llama3-8B" for MLLMs based on LLaVA-v1.6. MODEL=AdaptLLM/remote-sensing-Qwen2-VL-2B-Instruct # Set the directory for saving model prediction outputs: OUTPUT_DIR=./output/AdaMLLM-remote-sensing-Qwen-2B_${DOMAIN} # Run inference with data parallelism; adjust CUDA devices as needed: CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' bash run_inference.sh ${MODEL} ${DOMAIN} ${MODEL_TYPE} ${OUTPUT_DIR} ${RESULTS_DIR} ``` Detailed scripts to reproduce our results are in [Evaluation.md](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Evaluation.md) ### 3) Results The evaluation results are stored in `./eval_results`, and the model prediction outputs are in `./output`. ## Citation If you find our work helpful, please cite us. [Adapt MLLM to Domains](https://huggingface.co/papers/2411.19930) (EMNLP 2025 Findings) ```bibtex @article{adamllm, title={On Domain-Adaptive Post-Training for Multimodal Large Language Models}, author={Cheng, Daixuan and Huang, Shaohan and Zhu, Ziyu and Zhang, Xintong and Zhao, Wayne Xin and Luan, Zhongzhi and Dai, Bo and Zhang, Zhenliang}, journal={arXiv preprint arXiv:2411.19930}, year={2024} } ``` [Adapt LLM to Domains](https://huggingface.co/papers/2309.09530) (ICLR 2024) ```bibtex @inproceedings{ cheng2024adapting, title={Adapting Large Language Models via Reading Comprehension}, author={Daixuan Cheng and Shaohan Huang and Furu Wei}, booktitle={The Twelfth International Conference on Learning Representations}, year={2024}, url={https://openreview.net/forum?id=y886UXPEZ0} } ```

# 基于后训练的多模态大语言模型领域适配(EMNLP 2025) 本仓库包含我们论文《面向多模态大语言模型的专属领域后训练方法》(原标题*On Domain-Specific Post-Training for Multimodal Large Language Models*)中用于评估多模态大语言模型(Multimodal Large Language Model, MLLM)的**遥感视觉指令任务集**。论文链接:[https://huggingface.co/papers/2411.19930](https://huggingface.co/papers/2411.19930)。 主项目页面:[Adapt-MLLM-to-Domains](https://huggingface.co/AdaptLLM/Adapt-MLLM-to-Domains) ## 1. 数据集下载 你可以通过`datasets`库加载数据集: python from datasets import load_dataset # 从可用任务列表中选择任务名称 task_name = 'CLRS' # 可选值:'CLRS'、'UC_Merced'、'FloodNet'、'NWPU-Captions' # 加载所选任务对应的数据集 data = load_dataset('AdaptLLM/remote-sensing-VQA-benchmark', task_name, split='test') print(list(data)[0]) CLRS与UC_Merced两类任务的类别名称与索引映射关系如下: python # CLRS label_to_name_map = {'0': 'agricultural', '1': 'airplane', '2': 'baseball diamond', '3': 'beach', '4': 'buildings', '5': 'chaparral', '6': 'dense residential', '7': 'forest', '8': 'freeway', '9': 'golf course', '10': 'harbor', '11': 'intersection', '12': 'medium residential', '13': 'mobile home park', '14': 'overpass', '15': 'parking lot', '16': 'river', '17': 'runway', '18': 'sparse residential', '19': 'storage tanks', '20': 'tennis court'} # UC_Merced label_to_name_map = {'0': 'agricultural', '1': 'airplane', '2': 'baseball diamond', '3': 'beach', '4': 'buildings', '5': 'chaparral', '6': 'dense residential', '7': 'forest', '8': 'freeway', '9': 'golf course', '10': 'harbor', '11': 'intersection', '12': 'medium residential', '13': 'mobile home park', '14': 'overpass', '15': 'parking lot', '16': 'river', '17': 'runway', '18': 'sparse residential', '19': 'storage tanks', '20': 'tennis court'} ## 2. 在适配vLLM的遥感基准测试集上评估任意多模态大语言模型 我们提供了直接评估适配vLLM的多模态大语言模型的指南,支持的模型包括LLaVA-v1.6([开源版本](https://huggingface.co/Lin-Chen/open-llava-next-llama3-8b))、Qwen2-VL-Instruct以及Llama-3.2-Vision-Instruct。如需评估其他多模态大语言模型,请参考[该指南](https://github.com/vllm-project/vllm/blob/main/examples/offline_inference_vision_language.py),修改`vllm_inference/utils/task.py`中的`BaseTask`类。欢迎随时联系我们获取协助! **数据集加载脚本已嵌入推理代码,你可直接运行以下命令完成多模态大语言模型的评估。** ### 1) 环境配置 可通过`pip`或[源码编译](https://vllm.readthedocs.io/en/latest/getting_started/installation.html#build-from-source)安装vLLM。 根据vLLM官方文档的建议,请在全新的conda环境中安装vLLM: bash conda create -n vllm python=3.10 -y conda activate vllm pip install vllm # 若需适配Llama-3.2,请确保vLLM版本≥0.6.2;若不使用Llama-3.2,vLLM==0.6.1即可满足需求。 克隆本仓库并进入推理目录: bash git clone https://github.com/bigai-ai/QA-Synthesizer.git cd QA-Synthesizer/vllm_inference RESULTS_DIR=./eval_results # 用于保存评估分数的目录 ### 2) 模型评估 运行以下命令: bash # 指定领域:可选值为 ['remote-sensing', 'CLRS', 'UC_Merced', 'FloodNet', 'NWPU-Captions'] # 'remote-sensing' 会在所有遥感任务上执行推理;其余选项仅在对应单个任务上执行推理。 DOMAIN='remote-sensing' # 指定模型类型:可选值为 ['llava', 'qwen2_vl', 'mllama'] # 分别对应 LLaVA-v1.6、Qwen2-VL 以及 Llama-3.2-Vision-Instruct 模型。 MODEL_TYPE='qwen2_vl' # 设置 Hugging Face 上的模型仓库ID,示例如下: # 基于Qwen2-VL-Instruct的模型:"Qwen/Qwen2-VL-2B-Instruct"、"AdaptLLM/remote-sensing-Qwen2-VL-2B-Instruct" # 基于Llama-3.2-Vision-Instruct的模型:"meta-llama/Llama-3.2-11B-Vision-Instruct"、"AdaptLLM/remote-sensing-Llama-3.2-11B-Vision-Instruct" # 基于LLaVA-v1.6的模型:"AdaptLLM/remote-sensing-LLaVA-NeXT-Llama3-8B" MODEL=AdaptLLM/remote-sensing-Qwen2-VL-2B-Instruct # 设置模型预测结果的保存目录: OUTPUT_DIR=./output/AdaMLLM-remote-sensing-Qwen-2B_${DOMAIN} # 开启数据并行模式执行推理;请根据实际情况调整CUDA设备编号: CUDA_VISIBLE_DEVICES='0,1,2,3,4,5,6,7' bash run_inference.sh ${MODEL} ${DOMAIN} ${MODEL_TYPE} ${OUTPUT_DIR} ${RESULTS_DIR} 可复现我们实验结果的详细脚本请参见[Evaluation.md](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Evaluation.md)。 ### 3) 评估结果 评估结果将保存至`./eval_results`目录,模型预测输出将保存至`./output`目录。 ## 引用 若您的工作用到了本项目,请引用我们的论文。 ### [Adapt MLLM to Domains](https://huggingface.co/papers/2411.19930)(EMNLP 2025 发现成果) bibtex @article{adamllm, title={On Domain-Adaptive Post-Training for Multimodal Large Language Models}, author={Cheng, Daixuan and Huang, Shaohan and Zhu, Ziyu and Zhang, Xintong and Zhao, Wayne Xin and Luan, Zhongzhi and Dai, Bo and Zhang, Zhenliang}, journal={arXiv preprint arXiv:2411.19930}, year={2024} } ### [Adapt LLM to Domains](https://huggingface.co/papers/2309.09530)(ICLR 2024) bibtex @inproceedings{ cheng2024adapting, title={Adapting Large Language Models via Reading Comprehension}, author={Daixuan Cheng and Shaohan Huang and Furu Wei}, booktitle={The Twelfth International Conference on Learning Representations}, year={2024}, url={https://openreview.net/forum?id=y886UXPEZ0} }
提供机构:
maas
创建时间:
2025-04-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作