remote-sensing-visual-instructions

Name: remote-sensing-visual-instructions
Creator: maas
Published: 2025-12-04 09:18:55
License: 暂无描述

魔搭社区2025-12-04 更新2025-04-26 收录

下载链接：

https://modelscope.cn/datasets/AdaptLLM/remote-sensing-visual-instructions

下载链接

链接失效反馈

官方服务：

资源简介：

# Adapting Multimodal Large Language Models to Domains via Post-Training (EMNLP 2025) This repos contains the **remote-sensing visual instructions for post-training MLLMs** in our paper: [On Domain-Specific Post-Training for Multimodal Large Language Models](https://huggingface.co/papers/2411.19930). The main project page is: [Adapt-MLLM-to-Domains](https://huggingface.co/AdaptLLM/Adapt-MLLM-to-Domains) ## Data Information Using our [visual instruction synthesizer](https://huggingface.co/AdaptLLM/visual-instruction-synthesizer), we generate visual instruction tasks based on the image-caption pairs from NWPU-Captions, RSICD, RSITMD, Syndey-captions and UCM-captions. These synthetic tasks, combined with the original image captioning tasks, are used to train general MLLMs through a single-stage post-training process. - **image_caption_and_synthetic_task.json**: This dataset is used to reproduce our `single-stage domain-specific post-training`, containing both image-captioning tasks and synthetic visual-instruction tasks, totaling 36K examples. ## To Download the Data 1. Set up dependencies: ```bash pip install "huggingface_hub[cli]" ``` 2. Download data: ```bash REPO="AdaptLLM/remote-sensing-visual-instructions" # The local directory where you intend to save the files LOCAL_DIR="./remote-sensing-visual-instructions" huggingface-cli download --resume-download ${REPO} --local-dir ${LOCAL_DIR} --repo-type dataset ``` ## To reproduce the data We have included detailed scripts to reproduce the data in [Synthesis.md](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Synthesis.md). ## To train MLLMs using the data Our training data can be easily used to train MLLMs based on the `Llava` repository or the `LLaMA Factory` repository. Please refer to the [Post-Train Guide](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Post_Train.md). ## Licensing Information This data collection contains image-caption pairs from various sources. Please ensure compliance with their respective licenses. ## Citation If you find our work helpful, please cite us. [Adapt MLLM to Domains](https://huggingface.co/papers/2411.19930) (EMNLP 2025 Findings) ```bibtex @article{adamllm, title={On Domain-Adaptive Post-Training for Multimodal Large Language Models}, author={Cheng, Daixuan and Huang, Shaohan and Zhu, Ziyu and Zhang, Xintong and Zhao, Wayne Xin and Luan, Zhongzhi and Dai, Bo and Zhang, Zhenliang}, journal={arXiv preprint arXiv:2411.19930}, year={2024} } ``` [Adapt LLM to Domains](https://huggingface.co/papers/2309.09530) (ICLR 2024) ```bibtex @inproceedings{ adaptllm, title={Adapting Large Language Models via Reading Comprehension}, author={Daixuan Cheng and Shaohan Huang and Furu Wei}, booktitle={The Twelfth International Conference on Learning Representations}, year={2024}, url={https://openreview.net/forum?id=y886UXPEZ0} } ```

# 基于后训练的多模态大语言模型领域适配（EMNLP 2025）本代码仓库包含我们发表于论文《面向多模态大语言模型的领域专属后训练》（On Domain-Specific Post-Training for Multimodal Large Language Models，https://huggingface.co/papers/2411.19930）中，用于多模态大语言模型（Multimodal Large Language Model, MLLM）后训练的遥感视觉指令数据集。项目主页面：[Adapt-MLLM-to-Domains](https://huggingface.co/AdaptLLM/Adapt-MLLM-to-Domains) ## 数据信息我们借助视觉指令合成器（visual instruction synthesizer，https://huggingface.co/AdaptLLM/visual-instruction-synthesizer），基于NWPU-Captions、RSICD、RSITMD、Syndey-captions及UCM-captions的图像-标题对生成视觉指令任务。将上述合成任务与原始图像标题生成任务相结合，我们可通过单阶段后训练流程对通用多模态大语言模型进行训练。 - **image_caption_and_synthetic_task.json**：该数据集用于复现我们的`单阶段领域专属后训练`流程，同时包含图像标题生成任务与合成视觉指令任务，总计36000条样本。 ## 数据下载步骤 1. 配置依赖环境： bash pip install "huggingface_hub[cli]" 2. 下载数据： bash REPO="AdaptLLM/remote-sensing-visual-instructions" # 用于存储数据的本地目录 LOCAL_DIR="./remote-sensing-visual-instructions" huggingface-cli download --resume-download ${REPO} --local-dir ${LOCAL_DIR} --repo-type dataset ## 数据复现我们在[Synthesis.md](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Synthesis.md)中提供了复现该数据集的详细脚本。 ## 模型训练本训练数据集可直接用于基于`Llava`仓库或`LLaMA Factory`仓库的多模态大语言模型训练，请参考[后训练指南](https://github.com/bigai-ai/QA-Synthesizer/blob/main/docs/Post_Train.md)。 ## 授权信息本数据集包含来自多个来源的图像-标题对，请务必遵守各来源对应的授权协议。 ## 引用说明若您的工作用到了本项目内容，请引用我们的论文。 ### 《面向多模态大语言模型的领域适配》（EMNLP 2025 发现论文） bibtex @article{adamllm, title={On Domain-Adaptive Post-Training for Multimodal Large Language Models}, author={Cheng, Daixuan and Huang, Shaohan and Zhu, Ziyu and Zhang, Xintong and Zhao, Wayne Xin and Luan, Zhongzhi and Dai, Bo and Zhang, Zhenliang}, journal={arXiv preprint arXiv:2411.19930}, year={2024} } ### 《面向大语言模型的领域适配》（ICLR 2024） bibtex @inproceedings{ adaptllm, title={Adapting Large Language Models via Reading Comprehension}, author={Daixuan Cheng and Shaohan Huang and Furu Wei}, booktitle={The Twelfth International Conference on Learning Representations}, year={2024}, url={https://openreview.net/forum?id=y886UXPEZ0} }

提供机构：

maas

创建时间：

2025-04-22

搜集汇总

数据集介绍