five

GeoCaption-12K

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/rsvl-caption
下载链接
链接失效反馈
官方服务:
资源简介:
The scarcity of multimodal datasets in remote sensing, particularly those combining high-resolution imagery with descriptive textual annotations, limits advancements in context-aware analysis. To address this, we introduce a novel dataset comprising 12,473 aerial and satellite images sourced from established benchmarks (RSSCN7, DLRSD, iSAID, LoveDA, and WHU), enriched with automatically generated pseudo-captions and semantic tags. Using a two-step pipeline, we first construct structured prompts from polygon-based annotations and employ GPT-4O to generate detailed captions articulating spatial layouts and object relationships, resulting in captions averaging 181.94 words with a vocabulary of 3,961 unique words. These captions are filtered for noise reduction and paired with semantic tags extracted via named entity recognition and part-of-speech tagging, providing domain-specific cues (e.g., \u201cbuilding,\u201d \u201criver,\u201d \u201crunway\u201d). This dataset, with its comprehensive visual-textual annotations, significantly reduces manual annotation costs while enabling advanced multimodal remote sensing tasks, such as scene understanding and spatially informed visual representation learning across diverse urban, rural, and natural environments. The dataset is publicly available to foster research in remote sensing and multimodal learning.
提供机构:
Xing Zi
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作