five

Air Quality and Meteorological Dataset from Monitoring Stations in Salvador, Brazil, 2011–2016

收藏
Mendeley Data2026-05-21 收录
下载链接:
https://data.mendeley.com/datasets/rmtgrf7w45
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains hourly air quality and meteorological measurements collected from air quality monitoring stations located in Salvador, Bahia, Brazil, covering the period from 2011 to 2016. The data were obtained from the monitoring network operated by CETREL S.A. and include records from eight monitoring stations: Av. ACM–Detran (ACM), Av. Barros Reis (BR), Paralela–CAB (CAB), Campo Grande (CG), Dique do Tororó (DT), Itaigara (IT), Pirajá (PI), and Rio Vermelho (RV). The dataset was used in the study entitled “Interpretação de Poluentes e Variáveis Meteorológicas por Meio de Modelos Explicáveis de Aprendizado de Máquina”, which investigates atmospheric patterns in Salvador using supervised machine learning and explainable artificial intelligence techniques. The dataset includes pollutant concentrations and meteorological variables represented as hourly averages. The pollutant variables include carbon monoxide (CO), sulfur dioxide (SO2), ozone (O3), nitrogen oxides (NO, NO2, and NOX), and particulate matter (MP). The meteorological variables include relative humidity (HUM), air temperature (TEMP), rainfall (RAIN), wind speed (WIND_SPEED), standard deviation of wind direction (STWD), and transformed wind direction components represented by sine and cosine values. The final processed dataset contains 193,569 samples and 16 columns, including timestamp, monitoring station identifier, pollutant variables, meteorological variables, and trigonometric representations of wind direction. The preprocessing steps included removal of missing or inconsistent records, treatment of outliers in pollutant variables using an interquartile range criterion, and transformation of the angular wind direction variable into sine and cosine components to avoid discontinuities associated with circular data. This dataset can support studies on urban air quality, atmospheric pollution, environmental monitoring, machine learning, spatial characterization of monitoring stations, and explainable artificial intelligence applied to environmental data. In the associated article, the dataset was used to train a Random Forest classifier for monitoring station classification and to apply SHAP-based explainability analysis, allowing the identification of relevant pollutant and meteorological variables associated with spatial atmospheric patterns in Salvador. The data are suitable for reproducibility studies, benchmarking of machine learning models, exploratory analysis of pollutant and meteorological relationships, and development of interpretable models for air quality assessment.
创建时间:
2026-05-14
二维码
社区交流群
二维码
科研交流群
商业服务