five

Dataset.

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://figshare.com/articles/dataset/Dataset_/29406409
下载链接
链接失效反馈
官方服务:
资源简介:
Population prediction could provide effective data support for social and economic planning and decision-making, especially for the sub-national population forecasting accurately. In addition to realizing efficient smart population management, this research focuses primarily on the combination model for forecasting demographic data based on machine learning. As to the higher error of population forecasts due to high population density and mobility, a dynamic monitoring method based on mobile communication big data such as mobile phone signals is proposed, combined with more structurally stable traditional statistical data, it forms a multi-source dataset that possesses both accuracy and real-time characteristics. In the study, the Extreme Gradient Boosting tree (XGBoost) model is used to identify the base model to create a reliable predictive model for population dynamic monitoring. The sparrow search algorithm (SSA) is investigated to obtain more reasonable parameters of XGBoost to improve forecast accuracy. The combination model is verified based on the data of the 6th and 7th national population census and mobile phone signal data in Hebei Province, obtained the predicted data for mortality and migration, categorized by age and gender, for the following year. Subsequently, the research compared the performance of different metaheuristic algorithms and various gradient-boosting machine-learning models on the dataset. The SSA-XGBoost model demonstrates a better prediction performance in the demographic data forecast with better R2 0.9984 and a lower mean absolute error of 0.0002 and a mean squared error of 6.9184. The results of the comparative experiments and cross-validation show that the proposed predictive model can effectively forecast the demographic data for sub-national regions to realize smart population management.

人口预测可为社会经济规划与决策提供有效数据支撑,尤其对精准开展次国家级(sub-national)人口预测而言具有重要意义。本研究以实现高效智能人口管理为目标之一,核心聚焦于基于机器学习的人口数据预测组合模型。针对人口密度高、流动性强导致的人口预测误差偏大问题,本研究提出一种基于手机信令等移动通信大数据的动态监测方法,并将其与结构稳定性更强的传统统计数据相结合,构建出兼具准确性与实时性的多源数据集。本研究选取极限梯度提升树(Extreme Gradient Boosting, XGBoost)模型作为基础模型,搭建可靠的人口动态监测预测模型;同时引入麻雀搜索算法(Sparrow Search Algorithm, SSA)对XGBoost模型的参数进行优化,以进一步提升预测精度。本研究以河北省第六次、第七次全国人口普查数据及手机信令数据对该组合模型进行验证,预测得到次年分年龄、分性别的死亡人口与迁移人口数据。随后,本研究在该数据集上对比了不同元启发式算法与各类梯度提升类机器学习模型的预测性能。实验结果显示,SSA-XGBoost模型在人口数据预测任务中表现更优,其决定系数(R²)达0.9984,平均绝对误差仅为0.0002,均方误差为6.9184。对比实验与交叉验证结果证实,本研究提出的预测模型可有效实现次国家级区域的人口数据预测,助力智能人口管理的落地实施。
创建时间:
2025-06-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作