GeoCaption-12K
收藏IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/rsvl-caption
下载链接
链接失效反馈官方服务:
资源简介:
The scarcity of multimodal datasets in remote sensing, particularly those combining high-resolution imagery with descriptive textual annotations, limits advancements in context-aware analysis. To address this, we introduce a novel dataset comprising 12,473 aerial and satellite images sourced from established benchmarks (RSSCN7, DLRSD, iSAID, LoveDA, and WHU), enriched with automatically generated pseudo-captions and semantic tags. Using a two-step pipeline, we first construct structured prompts from polygon-based annotations and employ GPT-4O to generate detailed captions articulating spatial layouts and object relationships, resulting in captions averaging 181.94 words with a vocabulary of 3,961 unique words. These captions are filtered for noise reduction and paired with semantic tags extracted via named entity recognition and part-of-speech tagging, providing domain-specific cues (e.g., \u201cbuilding,\u201d \u201criver,\u201d \u201crunway\u201d). This dataset, with its comprehensive visual-textual annotations, significantly reduces manual annotation costs while enabling advanced multimodal remote sensing tasks, such as scene understanding and spatially informed visual representation learning across diverse urban, rural, and natural environments. The dataset is publicly available to foster research in remote sensing and multimodal learning.
提供机构:
Xing Zi



