Machine Learning for Hourly Air Pollution Prediction in England (ML-HAPPE)
收藏DataCite Commons2025-05-01 更新2025-05-18 收录
下载链接:
https://catalogue.ceda.ac.uk/uuid/fc735f9878ed43e293b85f85e40df24d
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains estimates of air pollution levels across England for every hour of the year 2018. It covers seven major air pollutants that can affect human health and the environment. The data cover major air pollutants, including Nitrogen Dioxide (NO2), Nitric Oxide (NO), Nitrogen Oxides (NOx), Ozone (O3), Particulate Matter smaller than 10 micrometres (PM10) and smaller than 2.5 micrometres (PM2.5), and Sulphur Dioxide (SO2). Each air pollutant's concentrations are predicted not only as average (mean) values but also include estimates at lower (5th percentile), median (50th percentile), and upper (95th percentile) levels to highlight typical and potential extreme pollution scenarios.
The spatial coverage of the dataset includes the entire area of England, structured as an evenly spaced grid, with each grid square covering an area of 1 square kilometre (1 km x 1 km). Data points correspond to the centre of these grid squares. The complete air pollution England dataset includes hourly estimates for the entire year 2018, covering all 365 days and 24 hours of each day. There is also training data used for the model from real-world ambient air pollution monitoring stations, which encompasses the temporal period of 2014-2018, alongside the models used to make the predictions.
These pollution estimates were produced using a supervised machine learning method, which is a computational approach where algorithms are trained to identify patterns in historical data and apply these learned patterns to predict new data points. The predictions incorporated various environmental factors, including weather conditions (e.g., temperature, wind, precipitation), human activities (traffic patterns), satellite measurements, land-use types (urban, rural, industrial areas), and emission inventories (datasets detailing pollutants released into the atmosphere). Additionally, the dataset provides uncertainty intervals through percentile-based estimates, giving users insights into the reliability of the predictions.
This dataset was created to provide access to detailed, high-resolution estimates of actual air pollution concentrations across England. Unlike simpler models or general air quality scenarios, this dataset offers hour-by-hour predictions of air pollution levels at a fine spatial scale (1 km x 1 km), delivering a realistic and actionable understanding of air quality patterns at a resolution not previously available. By providing detailed, hourly estimates based on real-world environmental conditions and emissions data, the dataset makes it possible to support evidence-based decision-making and address essential challenges in regions where direct pollution measurements may not be available.
The dataset was created by Liam J. Berrisford at the University of Exeter during his PhD studies, supported by the UK Research and Innovation (UKRI) Centre for Doctoral Training in Environmental Intelligence. Full methodological details and data validation information are available in the associated open-access scientific publication. For more information about the data, see the README.md archived alongside this dataset.
This dataset provides hourly predictions of air pollution concentrations for England throughout the year 2018. These values are not direct measurements from monitoring stations but rather model-based estimates generated using a supervised machine-learning approach. The model was trained using real observations from the UK's national monitoring network, but it is capable of making predictions even in areas without any nearby monitoring stations. This means that the dataset offers complete spatial and temporal coverage, filling in gaps where no sensor data exists. This dataset focuses exclusively on England for the year 2018 and does not include data for other years or regions of the UK.
提供机构:
NERC EDS Centre for Environmental Data Analysis
创建时间:
2025-05-01



