基于多源遥感数据和集成机器学习方法的青藏高原人口分布数据集(2020)
收藏国家青藏高原科学数据中心2025-04-01 更新2025-04-12 收录
下载链接:
https://data.tpdc.ac.cn/zh-hans/data/98d72544-049a-4646-8840-b0d58550c05b
下载链接
链接失效反馈官方服务:
资源简介:
该数据基于多源遥感数据(地形、坡度、NDVI、夜间灯光、建筑高度、道路、不透水表面和腾讯位置数据等)和集成机器学习模型构建了2020年青藏高原人口空间分布信息,地理坐标为GCS_WGS_1984,投影坐标为WGS_1984_UTM_Zone_48N,数据空间分辨率均重采样至100m。
首先使用RF(Random Forest),GBDT(Gradient Boosting Decision Tree)和XGBoost(eXtreme Gradient Boosting)机器学习算法并结合多源遥感数据得到结果,然后使用堆叠学习集成法弥补单个模型造成的误差,从而最大程度提高结果精度。该数据采用 RF、GBDT 和 XGBoost 作为基础模型,并选取多元线性回归模型作为元模型进行集成加工。
数据已和全国第七次人口普查结果在乡镇级进行了对比分析(RMSE=4094.47),优于主流人口数据WorldPop数据集(RMSE= 7345.07) ,该数据可以作为青藏高原其他相关研究的基础数据。
This dataset generates the spatial population distribution data of the Tibetan Plateau in 2020 based on multi-source remote sensing data (including topography, slope, Normalized Difference Vegetation Index (NDVI), night-time light, building height, roads, impervious surfaces, Tencent Location Data, etc.) and ensemble machine learning models. The geographic coordinate system is GCS_WGS_1984, the projected coordinate system is WGS_1984_UTM_Zone_48N, and all the data are resampled to a spatial resolution of 100 m. First, three machine learning algorithms, namely Random Forest (RF), Gradient Boosting Decision Tree (GBDT) and eXtreme Gradient Boosting (XGBoost), were combined with multi-source remote sensing data to produce preliminary prediction results. Subsequently, the stacking ensemble learning framework was employed to mitigate the errors arising from individual base models, thereby maximizing the overall accuracy of the final dataset. Specifically, RF, GBDT and XGBoost were selected as the base models, while multiple linear regression was adopted as the meta-model for the ensemble processing. This dataset has undergone comparative analysis with the results of the 7th National Population Census of China at the township level, with a Root Mean Square Error (RMSE) of 4094.47. It outperforms the mainstream WorldPop population dataset (RMSE = 7345.07) and can serve as foundational data for other relevant studies on the Tibetan Plateau.
提供机构:
张慧铭,杨续超
创建时间:
2025-03-29
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集基于多源遥感数据和集成机器学习方法,构建了2020年青藏高原100米分辨率的人口空间分布信息。通过RF、GBDT和XGBoost算法结合堆叠学习集成法,显著提高了数据精度,优于主流WorldPop数据集,适用于青藏高原相关研究。
以上内容由遇见数据集搜集并总结生成



