rachitgoyell/vayu-raw
收藏Hugging Face2026-03-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/rachitgoyell/vayu-raw
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
language:
- en
tags:
- air-quality
- india
- cpcb
- pollution
- environment
- aqi
pretty_name: VAYU — Raw CPCB Air Quality Data (India)
size_categories:
- 1M<n<10M
---
# VAYU — Raw CPCB Air Quality Data
Raw sensor data collected from India's Central Pollution Control Board (CPCB)
Continuous Ambient Air Quality Monitoring Stations (CAAQMS).
## Contents
| File | Rows | Description |
|---|---|---|
| `aqi_india_38cols_knn_final.csv` | 842,160 | Primary dataset — hourly pollutant readings across 29 cities, KNN-imputed |
| `*_AQIBulletins.csv` (277 files) | ~300,000 | Daily AQI bulletins, one file per city, 277 cities total |
## Key Facts
- **Cities:** 29 cities in primary file, 277 cities across bulletin files
- **Time range:** 2015 – 2024 (hourly)
- **Pollutants:** PM2.5, PM10, NO2, SO2, CO, O3
- **Known issue:** Sentinel value `999` used by CPCB to indicate sensor error —
not a real reading, must be cleaned before use
- **Total raw files scanned:** 299 (CSV + XLSX)
## How to Use
This dataset is the input to the VAYU data cleaning pipeline.
Run `vayu_step1_setup.ipynb` → `vayu_step2_cleaning.ipynb` to produce the
cleaned version, or load the pre-cleaned version directly from
[vayu-cleaned](https://huggingface.co/datasets/rachitgoyell/vayu-cleaned).
## Related Repository
Cleaned and model-ready version:
[rachitgoyell/vayu-cleaned](https://huggingface.co/datasets/rachitgoyell/vayu-cleaned)
提供机构:
rachitgoyell



