2000–2020 Monthly Air Quality Index (AQI) Dataset of China at 1 km Spatial Resolution
收藏DataCite Commons2025-10-02 更新2026-05-03 收录
下载链接:
https://figshare.com/articles/dataset/2000_2020_Monthly_Air_Quality_Index_AQI_Dataset_of_China_at_1_km_Spatial_Resolution/29975356/1
下载链接
链接失效反馈官方服务:
资源简介:
This dataset provides monthly gridded Air Quality Index (AQI) data covering the entire territory of China from 2000 to 2020, with a spatial resolution of 1 km. The data were generated to support research on the associations between long-term/seasonal air pollution exposure and cardiovascular disease (CVD) risk in Chinese older adults (aged ≥65 years), as part of a study using the China Health and Retirement Longitudinal Study (CHARLS, 2011–2020) cohort. It captures fine-scale spatial and temporal variations in air quality across China, enabling precise linking of environmental exposure to individual health outcomes. China’s national standard (GB 3095–2018) as the maximum index among six criteria pollutants (PM₂.₅, PM₁₀, SO₂, CO, NO₂, O₃). Eighteen predictors were integrated to ensure accuracy, including meteorological variables (e.g., 2-m air temperature, 10-m wind speed from the China Meteorological Forcing Dataset), vegetation metrics (Normalized Difference Vegetation Index [NDVI], Net Primary Productivity [NPP]), anthropogenic factors (downscaled GDP, population density, Human Footprint Index), and soil properties (pH, soil organic carbon from China’s High-Resolution National Soil Information Grid). Four tree-based ensemble algorithms (Random Forest [RF], Gradient Boosting Machine [GBM], CatBoost, XGBoost) were compared, with the RF model selected as optimal (test set: R² = 0.83, Root Mean Square Error [RMSE] = 10.25, Mean Absolute Error [MAE] = 9.03) after validation via 10-fold geographic stratified cross-validation and 100 bootstrap iterations; Recursive Feature Elimination (RFE) further refined 14 core predictors to minimize overfitting. The dataset is provided as NCnet files (252 total, one per month) covering China (80°E–135°E, 15°N–53°N).
本数据集提供2000年至2020年覆盖中国全域的逐月网格化空气质量指数(Air Quality Index, AQI)数据,空间分辨率为1千米。本数据集的生成旨在支撑针对中国65岁及以上老年群体长期/季节性空气污染暴露与心血管疾病(Cardiovascular Disease, CVD)发病风险关联的相关研究,该研究依托中国健康与养老追踪调查(China Health and Retirement Longitudinal Study, CHARLS, 2011–2020)队列开展。该数据集捕捉了中国范围内空气质量的精细时空变化特征,可实现环境暴露与个体健康结局的精准匹配。
数据计算以中国国家标准GB 3095–2018为依据,以六种常规污染物(细颗粒物PM₂.₅、可吸入颗粒物PM₁₀、二氧化硫SO₂、一氧化碳CO、二氧化氮NO₂、臭氧O₃)中的最大污染物浓度值作为指数基准。为保障数据精度,本数据集整合了18项预测因子,包括气象变量(例如中国气象强迫数据集提供的2米气温、10米风速)、植被指标(归一化植被指数Normalized Difference Vegetation Index, NDVI、净初级生产力Net Primary Productivity, NPP)、人为活动因子(降尺度GDP、人口密度、人类足迹指数)以及土壤属性(源自中国高分辨率国家土壤信息网格的土壤pH值、土壤有机碳)。
本研究对比了四种基于树结构的集成学习算法:随机森林(Random Forest, RF)、梯度提升机(Gradient Boosting Machine, GBM)、CatBoost与XGBoost。经10折地理分层交叉验证与100次自举迭代验证后,最终选取表现最优的随机森林模型(测试集拟合优度R²=0.83,均方根误差Root Mean Square Error, RMSE=10.25,平均绝对误差Mean Absolute Error, MAE=9.03);随后通过递归特征消除(Recursive Feature Elimination, RFE)筛选出14个核心预测因子,以最小化过拟合风险。
本数据集以NCnet格式文件提供,总计252个(每月1个),覆盖中国全域范围(东经80°–135°,北纬15°–53°)。
提供机构:
figshare
创建时间:
2025-10-02



