南亚地区1km人口数据集（公元640-2020）

Name: 南亚地区1km人口数据集（公元640-2020）
Creator: 李士成,黄艳巧
Published: 2025-04-09 11:18:00
License: 暂无描述

国家青藏高原科学数据中心2025-04-09 更新2025-04-19 收录

下载链接：

https://data.tpdc.ac.cn/zh-hans/data/dd98b1fb-6f49-4a71-8b23-51ef22ae3a7d

下载链接

链接失效反馈

官方服务：

资源简介：

南亚是世界上人口分布最密集的地区之一。本数据集全面收集南亚人口相关史料和前人研究成果（详见数据说明文档和参考文献），细致考订并估算了南亚（今印度、巴基斯坦、尼泊尔、孟加拉国）公元640-1801年的人口数量，并与1871-1941年英属印度的人口普查数据（尼泊尔数据来自尼泊尔人口普查数据）和1950-2020年联合国世界人口展望数据衔接，得到南亚640-2020年共22期（640, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1595, 1750, 1801, 1871, 1901, 1921, 1941, 1960, 1980, 2000, 2010, 2020）的人口数量。接下来，基于地理探测器遴选影响人口空间分布的主导环境因子，收集历史时期居民点分布数据（详见数据说明文档和参考文献），采用随机森林回归模型将人口数量进行空间化。剔除水域、冰川、裸地/未利用地等无人区，确定历史人口最大分布范围的基础上，研制了南亚公元640-2020年1km分辨率人口数据集。采用留一法对模型进行检验，方差解释量为0.81，模型精度较好。和已有的HYDE历史人口数据集相比，本研究估算历史人口数量过程中吸纳了更多的史料和最新研究成果；在采用随机森林回归进行历史人口空间模拟中，本研究考虑了过去千年南亚居民点的变化，而HYDE数据集只考虑自然要素，且认为其稳定不变。因此，本数据集要比HYDE数据集更为可靠，可以更合理揭示南亚历史时期的人口时空变化特征，是南亚长时序人地关系演变，气候变化归因和生态保护等研究的基础数据。

South Asia is one of the most densely populated regions globally. This dataset comprehensively collects historical documents related to South Asia's population and previous research findings (see the data description document and references for details), carefully collates, verifies and estimates the population of South Asia (modern-day India, Pakistan, Nepal, and Bangladesh) from 640 to 1801 CE, and integrates it with the British India census data from 1871 to 1941 (Nepal's data comes from Nepal's national census) and the United Nations World Population Prospects data from 1950 to 2020, resulting in a total of 22 time-stamped population estimates for South Asia spanning 640 to 2020: 640, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1595, 1750, 1801, 1871, 1901, 1921, 1941, 1960, 1980, 2000, 2010, 2020. Next, we select dominant environmental factors influencing population spatial distribution using Geodetector, collect historical settlement distribution data (see data description document and references for details), and adopt a random forest regression model to spatialize the population estimates. By excluding uninhabited areas such as water bodies, glaciers, bare land/unused land, and defining the maximum distribution range of historical populations, we develop a 1km-resolution population dataset for South Asia from 640 to 2020 CE. The model is validated using Leave-One-Out Cross Validation (LOOCV), with a variance explained of 0.81, demonstrating good model accuracy. Compared with the existing HYDE historical population dataset, this study incorporates more historical documents and cutting-edge research results when estimating historical population sizes. When conducting spatial simulation of historical populations using random forest regression, this study considers changes in South Asian settlements over the past millennium, while the HYDE dataset only considers natural factors and assumes them to remain stable. Therefore, this dataset is more reliable than the HYDE dataset, and can more reasonably reveal the spatio-temporal variation characteristics of South Asia's historical population, serving as foundational data for research on long-term human-land relationship evolution, climate change attribution, ecological conservation, and other relevant studies in South Asia.

提供机构：

李士成,黄艳巧

创建时间：

2025-02-07

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集提供了南亚地区从公元640年到2020年的1km分辨率人口数据，覆盖印度、巴基斯坦、尼泊尔和孟加拉国，时间跨度为1380年，包含22期人口数量数据。数据集通过随机森林回归模型进行空间化处理，剔除了无人区，模型精度较高（方差解释量为0.81），比现有的HYDE数据集更可靠，适用于南亚长时序人地关系演变、气候变化归因和生态保护等研究。

以上内容由遇见数据集搜集并总结生成