Patterns discovery dataset for particulate matter (pm2.5) pollution trends in Japan
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.hhmgqnkrr
下载链接
链接失效反馈官方服务:
资源简介:
Air pollution presents a significant environmental risk, impacting human health, accelerating climate change, and disrupting ecosystems. The main aim of air pollution research is to pinpoint the most harmful pollutants identified in previous studies and to map regions exposed to high pollution levels. This study introduces a large-scale, high-quality dataset to advance the analysis of PM2.5 pollution and reveal hidden patterns through pattern mining techniques. The dataset covers five years of hourly PM2.5 measurements collected from approximately 1,900 sensors across Japan, sourced from the Ministry of the Environment's Soramame platform. This platform offers hourly pollutant records, downloadable as monthly raw data files. The unorganised raw data files are systematically organised and stored in database tables using an Entity-Relationship (ER) schema.
The primary objective of this dataset is to aid in developing and validating pattern mining models, enabling the accurate detection of frequent patterns within the PM2.5 dataset under diverse conditions. The dataset collection includes the "FINAL_DATASET" CSV file containing timestamps, sensor location IDs, and recorded PM2.5 values. Due to storage limitations, raw data files are excluded from the compressed ZIP (AEROS) file but can be accessed directly via the link provided in the README (Data). By revealing complex patterns, this dataset is a valuable resource for researchers employing pattern mining techniques in PM2.5 analysis. Publicly sharing this dataset promotes collaboration and advances efforts to identify frequently polluted sensors or regions. Researchers are invited to use and contribute to the dataset, broadening its relevance and potential impact.
Methods
The air pollution data was collected from Japan’s Soramame platform, which provides hourly updates on pollutant levels nationwide. The data files were collected from January 1, 2018, 01:00:00, to April 25, 2023, 22:00:00, covering records from approximately 1,900 sensors stationed in various locations across Japan. These files are initially unorganised in CSV format and require systematic organisation by year, month, time, sensor, and pollutant type. To maintain data integrity, we structured the dataset using an Entity-Relationship (ER) schema within a PostgreSQL database, comprising two main tables: the Sensor table (storing sensor name, ID, address, and location) and the Observations table (recording pollutant types and their values). A detailed step-by-step process is provided in the README, and this organization created a consolidated CSV file containing PM2.5 levels, timestamps, and sensor details.
创建时间:
2024-12-12



