VLMs in RS Datasets
收藏github2024-11-03 更新2024-11-15 收录
下载链接:
https://github.com/taolijie11111/VLMs-in-RS-review
下载链接
链接失效反馈官方服务:
资源简介:
该数据集集合了用于遥感领域的视觉语言模型(VLM)的相关数据集,包括手动构建的数据集、结合现有数据集和自动注释的数据集。这些数据集用于支持遥感图像的语义理解和生成任务。
This dataset collection compiles relevant datasets for vision-language models (VLMs) in the remote sensing domain, including manually constructed datasets as well as datasets that integrate existing datasets with automatic annotations. These datasets are developed to support semantic understanding and generation tasks for remote sensing images.
创建时间:
2024-10-28
原始信息汇总
📒Awesome VLMs in RS
📖Datasets in VLMs for RS
📖Manual Datasets
| Published in | Title | Image | Paper | Code/Project |
|---|---|---|---|---|
| CVPR 2024 | [Hallusionbench] HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models | 346 | Hallusionbench |
|
| arXiv 2023 | [RSICap] RSGPT: A Remote Sensing Vision Language Model and Benchmark | 2585 | RSICap |
|
| TGRS 2023 | [CRSVQA] Multistep Question-Driven Visual Question Answering for Remote Sensing | 4639 | CRSVQA |
📖Combining Datasets
| Published in | Title | Image | Paper | Code/Project |
|---|---|---|---|---|
| ICCV 2023 | [SATIN] Satin: A multi-task metadataset for classifying satellite imagery using vision-language models | ≈775K | SATIN |
|
| ICCV 2023 | [GeoPile] Towards geospatial foundation models via continual pretraining | 600K | GeoPile |
|
| ICCV 2023 | [SatlasPretrain] Satlaspretrain: A large-scale dataset for remote sensing image understanding | 856K | SatlasPretrain |
|
| TGRS 2023 | [RSVGD] Rsvg: Exploring data and models for visual grounding on remote sensing data | 17402 | RSVGD |
|
| TGRS 2024 | [RefsegRS] Rrsis: Referring remote sensing image segmentation | 4420 | RefsegRS |
|
| arXiv 2024 | [SkyEye-968K] Skyeyegpt: Unifying remote sensing vision-language tasks via instruction tuning with large language model | 968K | SkyEye-968K |
|
| TGRS 2024 | [MMRS-1M] Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain | 1M | MMRS-1M |
|
| arXiv 2023 | [RSSA] H2rsvlm: Towards helpful and honest remote sensing large vision language model | 44K | RSSA |
|
| TGRS 2024 | [FineGrip] Panoptic perception: A novel task and fine-grained dataset for universal remote sensing image interpretation | 2649 | ||
| CVPR 2024 | [RRSIS-D] Rotated multiscale interaction network for referring remote sensing image segmentation | 17402 | RRSIS-D |
|
| TGRS 2022 | [RingMo] Ringmo: A remote sensing foundation model with masked image modeling | 2096640 | link | |
| arXiv 2023 | [GRAFT] Remote sensing vision-language foundation models without annotations via ground remote alignment | - | ||
| CVPR 2024 | [SkySense] Skysense: A multi-modal remote sensing foundation model towards universal interpretation for earth observation imagery | 21.5M | ||
| AAAI 2024 | [EarthVQA] Earthvqa: Towards queryable earth via relational reasoning-based remote sensing visual question answering | 6000 | EarthVQA |
|
| TGRS 2024 | [GeoSense] Generative convnet foundation model with sparse modeling and low-frequency reconstruction for remote sensing image interpretation | ≈9M | link | GeoSense |
📖Automatically Annoteted Datasets
| Published in | Title | Image | Paper | Code/Project |
|---|---|---|---|---|
| TGRS 2024 | [RS5M] Rs5m and georsclip: A large scale vision-language dataset and a large vision-language model for remote sensing | 5M | RS5M |
|
| AAAI 2024 | [SkyScript] Skyscript: A large and semantically diverse vision-language dataset for remote sensing | 2.6M | ||
| arXiv 2024 | [LHRS-Align] Lhrs-bot: Empowering remote sensing with vgi-enhanced large multimodal language model | 1.15M | LHRS-Align |
|
| CVPR 2024 | [GeoChat] Geochat: Grounded large vision-language model for remote sensing | 318K | GeoChat |
|
| ICML 2024 | [GeoReasoner] Georeasoner: Geo-localization with reasoning in street views using a large vision-language model | 70K+ | GeoReasoner |
|
| arXiv 2023 | [HqDC-1.4M] H2rsvlm: Towards helpful and honest remote sensing large vision language model | ≈1.4M | HqDC-1.4M |
|
| CVPR 2024 | [ChatEarthNet] ChatEarthNet: A Global-Scale Image-Text Dataset Empowering Vision-Language Geo-Foundation Models | 163488 | ChatEarthNet |
|
| arXiv 2024 | [VRSBench] Vrsbench: A versatile vision-language benchmark dataset for remote sensing image understanding | 29614 | VRSBench |
|
| arXiv 2024 | [FIT-RS] Skysensegpt: A fine-grained instruction tuning dataset and model for remote sensing vision-language understanding | 1800.8K | FIT-RS |
搜集汇总
数据集介绍

构建方式
在遥感领域,视觉语言模型(VLMs)的构建方式主要通过三种途径:手动数据集、结合现有数据集以及自动标注数据集。手动数据集如RSICap和CRSVQA,通过人工标注确保数据的高质量。结合现有数据集如SATIN和GeoPile,通过整合多源数据增强模型的泛化能力。自动标注数据集如RS5M和SkyScript,利用先进的算法自动生成标签,大幅提升了数据集的规模和多样性。
特点
VLMs in RS数据集的显著特点在于其多模态性和大规模性。这些数据集不仅包含丰富的视觉信息,还结合了语言描述,使得模型能够更好地理解和处理复杂的遥感图像。此外,数据集的规模庞大,如SkyEye-968K和MMRS-1M,提供了充足的训练样本,有助于提升模型的性能和鲁棒性。
使用方法
使用VLMs in RS数据集时,研究者可以利用这些数据集进行多种任务的训练和评估,包括图像分类、目标检测、图像描述生成等。通过加载预处理的数据集,研究者可以快速搭建模型并进行实验。此外,数据集的多样性和高质量标注也为模型的微调和优化提供了坚实的基础。
背景与挑战
背景概述
近年来,视觉语言模型(VLMs)在人工智能领域取得了显著进展,特别是在远程 sensing(RS)领域。VLMs通过将视觉信息与语言信息对齐,能够处理更为复杂的任务,这与以往的判别模型有着本质区别。VLMs in RS Datasets数据集由Lijie Tao、Haokui Zhang等研究人员于2024年创建,旨在系统性地总结和分析VLMs在RS领域的应用。该数据集不仅涵盖了多种任务的基准数据,还详细介绍了提升VLMs性能的技术方法,对推动RS领域的研究具有重要意义。
当前挑战
VLMs in RS Datasets在构建过程中面临多重挑战。首先,如何有效地对齐视觉与语言信息,以确保模型能够准确理解和处理复杂的RS图像数据,是一个核心难题。其次,数据集的构建需要大量的标注工作,尤其是自动标注技术的准确性和效率问题,限制了数据集的扩展和更新。此外,VLMs在RS领域的应用仍处于探索阶段,如何设计有效的评估指标和方法,以衡量模型的实际性能,也是一个亟待解决的问题。
常用场景
经典使用场景
在遥感领域,VLMs in RS Datasets数据集的经典使用场景主要集中在视觉语言模型的训练与评估。该数据集通过整合多种遥感图像与对应的文本描述,为研究者提供了一个丰富的资源库,用于开发和验证视觉语言模型在遥感图像理解中的应用。这些模型能够处理图像与文本之间的复杂关系,从而实现如图像标注、问答系统等高级任务。
解决学术问题
VLMs in RS Datasets数据集解决了遥感领域中视觉语言模型训练数据稀缺的问题。通过提供大规模、多样化的遥感图像与文本对,该数据集显著提升了模型的泛化能力和任务适应性。这不仅促进了遥感图像理解技术的发展,也为跨模态学习提供了新的研究方向,具有重要的学术价值和应用前景。
衍生相关工作
基于VLMs in RS Datasets数据集,研究者们开发了多种创新性的视觉语言模型和应用。例如,RS-LLaVA模型通过联合训练实现了遥感图像的自动标注和问答功能,而SkySenseGPT则通过细粒度的指令调优,提升了遥感图像与文本理解的能力。这些工作不仅扩展了数据集的应用范围,也为遥感领域的技术进步提供了新的动力。
以上内容由遇见数据集搜集并总结生成



