CitySenseGPT: A Vision-Language Model for Semantic Reasoning of the Built Environment via Satellite Imagery

Figshare2025-12-20 更新2026-04-28 收录

下载链接：

https://figshare.com/articles/dataset/CitySenseGPT_A_Vision-Language_Model_for_Semantic_Reasoning_of_the_Built_Environment_via_Satellite_Imagery/30927182

下载链接

链接失效反馈

官方服务：

资源简介：

It remains a fundamental challenge to understand the semantics of the built environment (BE) from satellite imagery. However, existing approaches largely focus on only predicting handcrafted BE features, such as land-use labels or visual objects, without enabling deeper semantic reasoning of the BE. To address such challenges, this study introduces CitySenseGPT, a multimodal vision–language model (VLM) designed for integrating feature prediction and semantic understanding, thus expanding the spatial reasoning capabilities of AI models for remote sensing. We first construct a multimodal dataset that pairs tiles of satellite imagery with textual descriptions and numeric BE features. Then we design standardized BE-VQA tasks, including quantitative and qualitative Visual Question-Answering pairs to elicit semantic interpretations of the built environment. By integrating spatial grounding with multimodal reasoning, CitySenseGPT can generate accurate outputs across diverse task categories, outperforming the state-of-the-art LLaVA-1.5, GeoChat, InternVL-3, and VHM benchmarks. Our experimental results show that CitySenseGPT achieves 86.31\% accuracy in feature prediction, exceeding the strongest baseline by 30.12 \%. It also substantially outperforms all benchmarks in qualitative semantic understanding, drastically improving BLEU-4 from 0.0044 to 0.266, ROUGE-L from 0.186 to 0.528, and BERTScore from 0.871 to 0.938. Although trained in only two US cities, CitySenseGPT generalizes effectively to unseen metropolitan regions. Overall, CitySenseGPT provides a unified multimodal framework that bridges numeric feature prediction with higher-level semantic understanding of the built environment. The results highlight the potential of VLMs as scalable tools for urban planning, geospatial analysis, and data-driven decision-making.

创建时间：

2025-12-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集