CN-AEBench: Hourly Atmospheric Environmental Dataset for China (2023–Present) by Fusing Station Monitoring Data and ECMWF IFS Data
收藏DataCite Commons2025-11-19 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=e3fe59359d584d2aa99f0f24611dcafe
下载链接
链接失效反馈官方服务:
资源简介:
Atmospheric and environmental research critically depends on datasets that integrate atmospheric and environmental variables to reflect coupled atmosphere-environment processes, yet existing resources remain fragmented across isolated monitoring networks. To address this challenge, we present CN-AEBench—the first large-scale spatiotemporal dataset for China that effectively fuses surface observations from 2,267 meteorological stations, dual-level monitoring data from 2,583 environmental stations, and ECMWF Integrated Forecasting System (IFS) data. Recognizing that different environmental variables exhibit distinct spatiotemporal characteristics and require tailored fusion strategies, we develop an element-specific fusion framework that strategically combines Inverse Distance Weighting (IDW), LightGBM, and Geographically Weighted Regression (GWR), significantly enhancing the fusion accuracy. Spanning from September 2023 to the present with continuous daily updates, CN-AEBench provides hourly data for 8 static and 41 dynamic atmospheric-environmental variables, offering high variable diversity and density. The dataset features a flexible three-tier architecture, progressing from raw observations to spatiotemporally aligned data and finally to the fully integrated multi-source product, supporting diverse research.For more information, variable list, dataset validity (extreme events) research and visualization, please visit the official repository: https://github.com/AIWeather126/CN-AEBenchData Coverage: 2023.09.01 00 - 2025.10.31 23 (in scienceDB & official repository) - Future (in official repository)Resolution: 1 HourArea: Most of ChinaNote: data for timeliness tasks is automatically uploaded to our official HuggingFace repository (https://huggingface.co/datasets/AIWeather126/CN-AEBench) daily around 12:30 and 21:30.1.Detailed L3 DescriptionCN-AEBench L3 data is specifically designed for building end-to-end intelligent forecasting models and is currently at version 1.0.0-rc.1.For checkpoints or subsets of the data, please contact us via email (ai_weather@126.com).To ensure benchmark stability and comparability of research results, we release new versions only when significant improvements are made to accommodate new weather and environmental changes, with clear version numbering.2.Detailed L1&L2 Description2.1. CN-AEBench-L1 DescriptionCN-AEBench L1 contains quality-controlled raw observational data, primarily designed for fundamental research applications including NWP data assimilation and gridding of observational data.--Usage GuidelinesCountryEnv - National Environmental Monitoring Station DataHistorical Data: Pre-2025.11.01: Batch processed and archived as `CountryEnv-L1.zip`Post-2025.11.01: Rolling updatesOrganization: Daily files with naming convention `YYYY_MM_dd_HH.csv`File Structure: Rows: Individual station recordsColumns: Environmental parameters (AQI, CO, NO2, O3, PM10, PM2.5, SO2)ProvinceEnv - Provincial Environmental Monitoring Station DataFormat: Compressed Parquet files, compatible with pandasNaming Convention: `ProvinceEnv-L1.parquet`Processing: Direct pandas DataFrame operations supportedAtmo - Meteorological Observation DataHistorical Data:Pre-2025.11.01: Batch processed and archived as `Atmo-L1.parquet`Post-2025.11.01: Rolling updatesOrganization: Daily files with naming convention `YYYY_MM_dd_HH.csv`File Structure:Rows: Individual station recordsColumns: Atmospheric variablesNWP - Numerical Weather Prediction DataRaw forecast data are not included in this repository. Users can obtain L1-NWP products directly from our mail or https://apps.ecmwf.int.2.2. CN-AEBench-L2 DescriptionCN-AEBench L2 builds upon L1 with spatiotemporal alignment, missing data imputation, model data registration, and diagnostic variable computation. It is designed for domain-adaptive pre-training, statistical analysis, event characterization, and sequence interpolation tasks.Usage GuidelinesCountryEnv - National Environmental Monitoring Station DataHistorical Data:Pre-2025.11.01: Batch processed and consolidated in `CountryEnv-L2.parquet`Post-2025.11.01: Rolling updatesOrganization: Daily files with naming convention `YYYY_MM_dd_HH.csv`File Structure:Rows: Individual station recordsColumns: Environmental parameters (AQI, CO, NO2, O3, PM10, PM2.5, SO2)ProvinceEnv - Provincial Environmental Monitoring Station DataFormat: Compressed Parquet files, compatible with pandasNaming Convention: `ProvinceEnv-L2.parquet`Processing: Direct pandas DataFrame operations supportedAtmo - Meteorological Observation DataHistorical Data:Pre-2025.11.01: Batch processed and consolidated in `Atmo-L2.parquet`Post-2025.11.01: Rolling updatesOrganization: Daily files with naming convention `YYYY_MM_dd_HH.csv`File Structure:Rows: Individual station recordsColumns: Meteorological variablesNWP - Numerical Weather Prediction DataFormat: Compressed Parquet files, compatible with pandasHistorical Data:Pre-2025.11.01: Batch processed and consolidated in `NWP-L2.parquet`Post-2025.11.01: Rolling updatesOrganization: Daily files with naming convention `nwp_YYYYMMdd.parquet`Processing: Direct pandas DataFrame operations supported
提供机构:
Science Data Bank
创建时间:
2025-11-19



