2na-97/FAKER-Air
收藏Hugging Face2026-03-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/2na-97/FAKER-Air
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- time-series-forecasting
tags:
- climate
- code
pretty_name: FAKER-Air
size_categories:
- 10B<n<100B
---
# FAKER-Air Dataset
This repository contains the dataset used in **FAKER-Air**, consisting of ground-truth air quality observations interpolated onto a grid and CMAQ reanalysis data tailored for East Asia.
- **Paper**: [Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization](https://www.arxiv.org/abs/2511.22169)
- **Code**: [GitHub Repository](https://github.com/kaist-cvml/FAKER-Air)
## Dataset Structure
The data is organized into two main directories inside `data/`:
### 1. Observations (`data/obs`)
Ground-truth station data interpolated onto the CMAQ 27km grid.
- **Format**: `.npz` (Compressed NumPy archives)
- **Naming**: `YYYYMMDDHH_obs.npz` (e.g., `2016010100_obs.npz`)
- **Content**: Contains arrays for pollutant concentrations (PM2.5, PM10, etc.) on the grid.
- **Total Files**: ~74,000 files (Hourly data from 2016 to 2023+).
### 2. CMAQ Reanalysis (`data/cmaq`)
Physics-based model outputs (Community Multiscale Air Quality).
- **Format**: `.npy` and `.json`
- **Structure**: `YYYY/MM/DD/NIER_27_01/`
- **Files**:
- `*_x_conc.npy`: Concentration fields.
- `*_x_metcro2d.npy`: 2D Meteorological fields.
- `*_x_metcro3d.npy`: 3D Meteorological fields.
- `*_meta.json`: Metadata.
## How to Use
You can download specific parts of the dataset using the `huggingface_hub` Python library.
### Prerequisites
```bash
pip install huggingface_hub numpy
````
### Download & Load Example
```python
from huggingface_hub import snapshot_download
import numpy as np
import os
# 1. Download the dataset (It will cache data locally)
# To download only specific years or folders, use `allow_patterns`.
local_dir = snapshot_download(
repo_id="2na-97/FAKER-Air",
repo_type="dataset",
allow_patterns=[
"data/obs/2023*.npz", # Example: Only download OBS for 2023
"data/cmaq/2023/**" # Example: Only download CMAQ for 2023
]
)
print(f"Data downloaded to: {local_dir}")
# 2. Load an OBS file
obs_path = os.path.join(local_dir, "data/obs/2023010100_obs.npz")
if os.path.exists(obs_path):
data = np.load(obs_path)
print("Keys in OBS:", data.files)
# Example access: data['pm25']
# 3. Load a CMAQ file
cmaq_path = os.path.join(local_dir, "data/cmaq/2023/01/01/NIER_27_01/20230101_x_conc.npy")
if os.path.exists(cmaq_path):
cmaq_data = np.load(cmaq_path)
print("CMAQ Shape:", cmaq_data.shape)
```
## Citation
```bibtex
@misc{kang2026realtimelonghorizonair,
title={Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization},
author={Inha Kang and Eunki Kim and Wonjeong Ryu and Jaeyo Shin and Seungjun Yu and Yoon-Hee Kang and Seongeun Jeong and Eunhye Kim and Soontae Kim and Hyunjung Shim},
year={2026},
eprint={2511.22169},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.22169},
}
```
提供机构:
2na-97



