SignSpotter: Scene Text-Aware Cross-Modal Semantic Consistency Verification for Street View Images and POI Data
收藏Figshare2026-01-02 更新2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/SignSpotter_Scene_Text-Aware_Cross-Modal_Semantic_Consistency_Verification_for_Street_View_Images_and_POI_Data/30985069
下载链接
链接失效反馈官方服务:
资源简介:
The semantic consistency between street view images and Points of Interest (POI) is a critical yet underexplored issue in the quality assessment of Volunteered Geographic Information (VGI) and location-based services. In real-world mapping platforms, inconsistencies frequently arise due to outdated POI attributes, visual occlusions, scale variations, and ambiguous textual cues in complex street scenes, posing significant challenges to reliable geographic entity validation. This paper proposes SignSpotter, a multi-scale cross-modal framework designed for entity-level semantic consistency verification between street-view images and POI textual attributes. First, a text-aware visual pre-training strategy is introduced, in which a Vision Transformer (ViT) is fine-tuned using large-scale scene text data to enhance its sensitivity to text regions in natural street scenes. Second, an asymmetric dual-tower architecture is constructed to separately model the multi-scale visual semantics of street view images and the high-level textual semantics of POI attributes, thereby accommodating the pronounced scale heterogeneity inherent in street view imagery. Finally, a cross-modal semantic reasoning and consistency verification module is developed, which establishes fine-grained semantic associations between visual patches and textual tokens through bidirectional interactions, enabling entity-level semantic consistency inference. Extensive experiments conducted on a large-scale street-view and POI dataset across multiple cities demonstrate that SignSpotter consistently outperforms representative baseline methods in terms of Precision, Recall, and F1-score. Additional cross-city transfer experiments and occlusion robustness analyses further confirm the generalization capability and stability of the proposed framework under diverse urban environments. The results indicate that incorporating text-aware visual representations and explicit cross-modal semantic reasoning provides an effective solution for POI semantic consistency validation, offering practical implications for geographic data quality control and large-scale map maintenance.
创建时间:
2026-01-02



