UCM-Captions, Sydney-Captions, RSICD, RSITMD, NWPU-Captions, RS5M, SkyScript
收藏github2024-12-09 更新2024-12-10 收录
下载链接:
https://github.com/BaolanChen/Awesome-Remote-Sensing-Cross-Modal-Image-Text-Retrieval
下载链接
链接失效反馈官方服务:
资源简介:
UCM-Captions: 包含613张图像,分辨率为256×256。Sydney-Captions: 包含2,100张图像,分辨率为500×500。RSICD: 包含10,921张图像,分辨率为224×224。RSITMD: 包含4,743张图像,分辨率为256×256。NWPU-Captions: 包含31,500张图像,分辨率为256×256。RS5M: 包含超过500万张图像,分辨率为所有可能的分辨率。SkyScript: 包含520万张图像,分辨率为所有可能的分辨率。
UCM-Captions: Contains 613 images with a resolution of 256×256.
Sydney-Captions: Contains 2,100 images with a resolution of 500×500.
RSICD: Contains 10,921 images with a resolution of 224×224.
RSITMD: Contains 4,743 images with a resolution of 256×256.
NWPU-Captions: Contains 31,500 images with a resolution of 256×256.
RS5M: Contains over 5 million images with arbitrary resolutions.
SkyScript: Contains 5.2 million images with arbitrary resolutions.
创建时间:
2024-11-19
原始信息汇总
Awesome-Remote-Sensing-Cross-Modal-Image-Text-Retrieval
数据集概述
遥感图像-文本数据集
| 数据集名称 | 图像数量 | 图像分辨率 | VLMs |
|---|---|---|---|
| UCM-Captions | 613 | 256 × 256 | - |
| Sydney-Captions | 2,100 | 500 × 500 | - |
| RSICD | 10,921 | 224 × 224 | - |
| RSITMD | 4,743 | 256 × 256 | - |
| NWPU-Captions | 31,500 | 256 × 256 | - |
| RS5M | 5 million+ | 所有分辨率 | GeoRSCLIP |
| SkyScript | 5.2 million+ | 所有分辨率 | SkyCLIP |
遥感跨模态图像-文本检索模型
| 论文 | 标题 | 出版物 | 机构 | 代码 | 备注 |
|---|---|---|---|---|---|
| CDMAN | Thread the Needle: Cues-Driven Multi-Association for Remote Sensing Cross-Modal Retrieval | TGRS 2024 | Wuhan University of Technology | - | |
| MSA | Transcending Fusion: A Multiscale Alignment Method for Remote Sensing Image–Text Retrieval | TGRS 2024 | Xidian University | Github | |
| KTIR | Knowledge-aware Text-Image Retrieval for Remote Sensing Images | TGRS 2024 | EPFL | - | |
| CMPAGL | Cross-Modal Prealigned Method With Global and Local Information for Remote Sensing Image and Text Retrieval | TGRS 2024 | Shanghai Maritime University | Github | |
| FGIS | Fine-Grained Information Supplementation and Value-Guided Learning for Remote Sensing Image-Text Retrieval | JSTARS 2024 | Chongqing University | - | |
| EBAKER | Eliminate Before Align: A Remote Sensing Image-Text Retrieval Framework with Keyword Explicit Reasoning | ACMMM 2024 | Tianjin University | - | |
| CUP | Cross-Modal Remote Sensing Image–Text Retrieval via Context and Uncertainty-Aware Prompt | TNNLS 2024 | Xidian University | Github | |
| CCLS2T | Cross-Modal Contrastive Learning With Spatiotemporal Context for Correlation-Aware Multiscale Remote Sensing Image Retrieval | TGRS 2024 | Xidian University | - | |
| MIIA | Global–Local Information Soft-Alignment for Cross-Modal Remote-Sensing Image–Text Retrieval | TGRS 2024 | Northwestern Polytechnical University | - | |
| SARCI | Scale-Aware Adaptive Refinement and Cross-Interaction for Remote Sensing Audio-Visual Cross-Modal Retrieval | TGRS 2024 | Wuhan University of Technology | Github | |
| GLISA | Masking-Based Cross-Modal Remote Sensing Image–Text Retrieval via Dynamic Contrastive Learning | TGRS 2024 | China University of Mining and Technology | - | |
| SCAT | Spatial–Channel Attention Transformer With Pseudo Regions for Remote Sensing Image-Text Retrieval | TGRS 2024 | Northwestern Polytechnical University | - | |
| FSISR | Cross-Modal Hashing With Feature Semi-Interaction and Semantic Ranking for Remote Sensing Ship Image Retrieval | TGRS 2024 | Harbin Institute of Technology | - | |
| SkyEyeGPT | Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model | Arxiv 2024 | Northwestern Polytechnical University | Github | |
| MFF-SFE | Cross-modal retrieval method based on MFF-SFE for remote sensing image-text | 中国科学院大学学报 2024 | Aerospace Information Research Institute, Chinese Academy of Sciences | - | |
| RemoteCLIP | RemoteCLIP: A Vision Language Foundation Model for Remote Sensing | TGRS 2024 | Hohai University | Github | |
| C2F-ITR | From Coarse To Fine: An Offline-Online Approach for Remote Sensing Cross-Modal Retrieval | IGARSS 2024 | Beijing Foreign Studies University | - | |
| MGRM-EL | Exploring Uni-Modal Feature Learning on Entities and Relations for Remote Sensing Cross-Modal Text-Image Retrieval | TGRS 2024 | Northwestern Polytechnical University | - | |
| SIRS | Multitask Joint Learning for Remote Sensing Foreground-Entity Image–Text Retrieval | TGRS 2024 | Soochow University | Github | |
| PIR | A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval | ACMMM 2023 oral | Zhejiang University of Technology | Github | |
| PE-RSITR | Parameter-Efficient Transfer Learning for Remote Sensing Image–Text Retrieval | TGRS 2023 | Northwestern Polytechnical University | Github | |
| HVSA | Hypersphere-Based Remote Sensing Cross-Modal Text–Image Retrieval via Curriculum Learning | TGRS 2023 | Aerospace Information Research Institute, Chinese Academy of Sciences | Github | |
| SWAN | Reducing Semantic Confusion Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval | ICMR 2023 oral | Zhejiang University of Technology | Github | |
| KAMCL | Knowledge-Aided Momentum Contrastive Learning for Remote-Sensing Image Text Retrieval | TGRS 2023 | Tianjin University | Github | |
| IEFT | Interacting-Enhancing Feature Transformer for Cross-Modal Remote-Sensing Image and Text Retrieval | TGRS 2023 | Xidian University | Github | |
| Multilanguage Transformer | Multilanguage Transformer for Improved Text to Remote Sensing Image Retrieval | JSTARS 2022 | King Saud University | - | |
| GaLR | Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information | TGRS 2022 | Aerospace Information Research Institute, Chinese Academy of Sciences | Github | |
| AMFMN | Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval | TGRS 2021 | Aerospace Information Research Institute, Chinese Academy of Sciences | Github | |
| LW-MCR | A Lightweight Multi-Scale Crossmodal Text-Image Retrieval Method in Remote Sensing | TGRS 2021 | Aerospace Information Research Institute, Chinese Academy of Sciences | Github | |
| VSE++ | VSE++: Improving Visual-Semantic Embeddings with Hard Negatives | BMVC 2018 spotlight | University of Toronto | Github |
遥感视觉基础模型
| 缩写 | 标题 | 出版物 | 论文 | 代码与权重 |
|---|---|---|---|---|
| GeoKR | Geographical Knowledge-Driven Representation Learning for Remote Sensing Images | TGRS2021 | GeoKR | link |
| GASSL | Geography-Aware Self-Supervised Learning | ICCV2021 | GASSL | link |
遥感视觉-语言基础模型
| 缩写 | 标题 | 出版物 | 论文 | 代码与权重 |
|---|---|---|---|---|
| RSGPT | RSGPT: A Remote Sensing Vision Language Model and Benchmark | Arxiv2023 | RSGPT | link |
| RemoteCLIP | RemoteCLIP: A Vision Language Foundation Model for Remote Sensing | Arxiv2023 | RemoteCLIP | link |
| GeoRSCLIP | RS5M: A Large Scale Vision-Language Dataset for Remote Sensing Vision-Language Foundation Model | Arxiv2023 | GeoRSCLIP | link |
| GRAFT | Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment | ICLR2024 | GRAFT | - |
遥感视觉-位置基础模型
| 缩写 | 标题 | 出版物 | 论文 | 代码与权重 |
|---|---|---|---|---|
| CSP | CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations | ICML2023 | CSP | link |
| GeoCLIP | GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization | NeurIPS2023 | GeoCLIP | link |
| SatCLIP | SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery | Arxiv2023 | SatCLIP | link |
搜集汇总
数据集介绍

构建方式
在遥感领域,UCM-Captions、Sydney-Captions、RSICD、RSITMD、NWPU-Captions、RS5M和SkyScript等数据集的构建,旨在支持跨模态图像-文本检索任务。这些数据集通过收集和标注大量高分辨率遥感图像及其对应的文本描述,确保了数据集的多样性和广泛性。图像分辨率从224×224到500×500不等,涵盖了多种场景和地物类型,为模型训练提供了丰富的视觉和语义信息。
特点
这些数据集的主要特点在于其高分辨率和多样性,能够有效支持遥感图像与文本之间的跨模态检索任务。此外,数据集的规模从数千到数百万不等,确保了训练模型的广泛适用性和鲁棒性。特别是RS5M和SkyScript,它们不仅包含大量图像,还支持多种分辨率,为不同应用场景提供了灵活性。
使用方法
使用这些数据集进行模型训练时,首先需要根据任务需求选择合适的图像和文本对。随后,可以采用预处理技术对图像进行标准化处理,如调整分辨率和归一化。对于文本部分,通常需要进行分词和编码处理。训练过程中,可以采用对比学习、多模态融合等技术,以提高模型在跨模态检索任务中的表现。最终,通过验证集评估模型性能,并进行必要的调优。
背景与挑战
背景概述
遥感技术在现代地理信息系统、环境监测和灾害管理等领域中占据重要地位。近年来,随着跨模态数据处理技术的发展,遥感图像与文本数据的联合分析成为研究热点。UCM-Captions、Sydney-Captions、RSICD、RSITMD、NWPU-Captions、RS5M和SkyScript等数据集的创建,旨在推动遥感图像与文本跨模态检索的研究。这些数据集由多个知名机构如武汉大学、西安电子科技大学和沙特国王大学等共同开发,主要解决遥感图像与文本之间的语义对齐问题,对提升遥感数据的理解和应用具有重要意义。
当前挑战
构建这些数据集面临多重挑战。首先,遥感图像与自然语言描述之间的语义鸿沟较大,如何准确匹配图像与文本描述是一大难题。其次,数据集的构建需要处理大量高分辨率图像,这对存储和计算资源提出了高要求。此外,不同数据集之间的标准化和互操作性问题也亟待解决,以确保研究成果的可重复性和广泛应用。最后,随着遥感技术的不断进步,数据集需要不断更新以反映最新的技术发展和应用需求。
常用场景
经典使用场景
在遥感领域,UCM-Captions, Sydney-Captions, RSICD, RSITMD, NWPU-Captions, RS5M, SkyScript等数据集的经典应用场景主要集中在跨模态图像-文本检索(RSCMIT)。这些数据集通过提供大规模的遥感图像及其对应的文本描述,支持研究人员开发和验证基于视觉和语言的模型。例如,这些数据集常用于训练和评估图像-文本匹配模型,以实现从文本描述中检索相关遥感图像或反之。此外,这些数据集还用于研究多模态学习中的特征对齐问题,以提高模型的跨模态理解和推理能力。
衍生相关工作
基于这些遥感图像-文本数据集,研究人员开发了多种经典工作。例如,CDMAN、MSA、KTIR等模型通过引入多模态对齐和知识增强技术,显著提升了图像-文本检索的准确性。CMPAGL和CCLS2T等方法则通过全局和局部信息的结合,进一步优化了跨模态检索的效果。此外,SkyEyeGPT和RemoteCLIP等模型通过大规模预训练和指令微调,实现了更强大的遥感视觉-语言基础模型。这些衍生工作不仅在学术界引起了广泛关注,也在实际应用中展现了巨大的潜力,推动了遥感跨模态检索技术的不断进步。
数据集最近研究
最新研究方向
在遥感领域,跨模态图像-文本检索(RSCMIT)的研究正迅速发展。最新的研究方向集中在开发先进的视觉-语言基础模型,如RSGPT和RemoteCLIP,这些模型通过大规模数据集如RS5M和SkyScript进行训练,以实现更精确的图像与文本匹配。此外,研究者们还在探索多任务联合学习、知识增强的对比学习以及基于地理信息的自监督学习等方法,以提升遥感图像与文本之间的跨模态检索性能。这些研究不仅推动了遥感技术的进步,也为地理信息系统和智能城市等领域提供了新的应用可能性。
以上内容由遇见数据集搜集并总结生成



