Performance and Generalizability Impacts of Incorporating Geolocation into Deep Learning for Dynamic PM2.5 Estimation

Name: Performance and Generalizability Impacts of Incorporating Geolocation into Deep Learning for Dynamic PM2.5 Estimation
Creator: figshare
Published: 2025-06-01 06:09:40
License: 暂无描述

DataCite Commons2025-06-01 更新2026-04-25 收录

下载链接：

https://figshare.com/articles/dataset/Performance_and_Generalizability_Impacts_of_Incorporating_Geolocation_into_Deep_Learning_for_Dynamic_PM2_5_Estimation/29139464/1

下载链接

链接失效反馈

官方服务：

资源简介：

Deep learning models have demonstrated success in geospatial applications, yet quantifying the role of geolocation information in enhancing model performance and geographic generalizability remains underexplored. A new generation of location encoders have emerged with the goal of capturing attributes present at any given location for downstream use in predictive modeling. Being a nascent area of research, their evaluation has remained largely limited to static tasks such as species distributions or average temperature mapping. In this paper, we discuss and quantify the impact of incorporating geolocation into deep learning for a real-world application domain that is characteristically dynamic (with fast temporal change) and spatially heterogeneous at high resolutions: estimating surface-level daily PM<sub>2.5</sub> levels using remotely sensed and ground-level data. We build on a recently published deep learning-based PM<sub>2.5</sub> estimation model that achieves state-of-the-art performance on data observed in the contiguous United States. We examine three approaches for incorporating geolocation: excluding geolocation as a baseline, using raw geographic coordinates, and leveraging pretrained location encoders. We evaluate each approach under within-region (WR) and out-of-region (OoR) evaluation scenarios. Aggregate performance metrics indicate that while naïve incorporation of raw geographic coordinates improves within-region performance by retaining the interpolative value of geographic location, it can hinder generalizability across regions. In contrast, pretrained location encoders like GeoCLIP enhance predictive performance and geographic generalizability for both WR and OoR scenarios. However, our qualitative analysis reveals artifact patterns caused by high-degree basis functions and sparse upstream samples in certain areas, and our ablation results indicate varying performance among location encoders such as SatCLIP vs. GeoCLIP. To the best of our knowledge, this is a first integration and systematic evaluation of location encoders in a complex, temporally dynamic estimation scenario. In addition to guiding better model development for air pollution estimation and location encoders, this study provides insights for effective incorporation of location into deep learning for geospatial predictive tasks.

深度学习模型已在地理空间应用中展现出优异性能，但量化地理位置信息对提升模型性能与地理泛化性的作用，仍有待充分探索。新一代位置编码器（location encoder）应运而生，旨在捕捉任意给定位置的属性特征，以供预测建模的下游任务使用。作为新兴研究方向，此类编码器的评估目前大多局限于物种分布、平均气温制图等静态任务。本文针对一类兼具快速时间变化特性与高分辨率空间异质性的动态真实应用场景——即利用遥感与地面观测数据估算地表每日PM₂.₅浓度——探讨并量化了将地理位置信息融入深度学习的影响。我们基于近期公开的、在美国本土连续区域观测数据上达到当前最优（state-of-the-art）性能的深度学习PM₂.₅估算模型展开研究。我们测试了三种融入地理位置信息的方案：以不加入地理位置信息作为基线、使用原始地理坐标，以及采用预训练位置编码器。我们在区域内（WR）与跨区域（OoR）两种评估场景下对各方案进行测试。综合性能指标显示，尽管直接融入原始地理坐标可通过保留地理位置的插值价值提升区域内性能，但会阻碍跨区域泛化能力。与之形成对比的是，GeoCLIP这类预训练位置编码器可同时提升区域内与跨区域场景下的预测性能与地理泛化性。但我们的定性分析发现，部分区域存在由高阶基函数与上游样本稀疏性导致的伪影现象；同时消融实验结果显示，SatCLIP与GeoCLIP等不同位置编码器的性能存在差异。据我们所知，本研究首次在复杂的时间动态估算场景中完成了位置编码器的整合与系统性评估。本研究不仅为空气污染估算与位置编码器的模型优化提供了指导，还为如何将地理位置信息有效融入地理空间预测任务的深度学习模型提供了参考思路。

提供机构：

figshare

创建时间：

2025-05-23

5,000+

优质数据集

54 个

任务类型

进入经典数据集