five

CitySenseGPT: A Vision-Language Model for Semantic Reasoning of the Built Environment via Satellite Imagery

收藏
Figshare2025-12-20 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/CitySenseGPT_A_Vision-Language_Model_for_Semantic_Reasoning_of_the_Built_Environment_via_Satellite_Imagery/30927182
下载链接
链接失效反馈
官方服务:
资源简介:
It remains a fundamental challenge to understand the semantics of the built environment (BE) from satellite imagery. However, existing approaches largely focus on only predicting handcrafted BE features, such as land-use labels or visual objects, without enabling deeper semantic reasoning of the BE. To address such challenges, this study introduces CitySenseGPT, a multimodal vision–language model (VLM) designed for integrating feature prediction and semantic understanding, thus expanding the spatial reasoning capabilities of AI models for remote sensing. We first construct a multimodal dataset that pairs tiles of satellite imagery with textual descriptions and numeric BE features. Then we design standardized BE-VQA tasks, including quantitative and qualitative Visual Question-Answering pairs to elicit semantic interpretations of the built environment. By integrating spatial grounding with multimodal reasoning, CitySenseGPT can generate accurate outputs across diverse task categories, outperforming the state-of-the-art LLaVA-1.5, GeoChat, InternVL-3, and VHM benchmarks. Our experimental results show that CitySenseGPT achieves 86.31\% accuracy in feature prediction, exceeding the strongest baseline by 30.12 \%. It also substantially outperforms all benchmarks in qualitative semantic understanding, drastically improving BLEU-4 from 0.0044 to 0.266, ROUGE-L from 0.186 to 0.528, and BERTScore from 0.871 to 0.938. Although trained in only two US cities, CitySenseGPT generalizes effectively to unseen metropolitan regions. Overall, CitySenseGPT provides a unified multimodal framework that bridges numeric feature prediction with higher-level semantic understanding of the built environment. The results highlight the potential of VLMs as scalable tools for urban planning, geospatial analysis, and data-driven decision-making.
创建时间:
2025-12-20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作