cassini-team-todo/eea-waterbase-cleaned
收藏Hugging Face2026-04-25 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/cassini-team-todo/eea-waterbase-cleaned
下载链接
链接失效反馈官方服务:
资源简介:
---
pretty_name: EEA Waterbase Optics5 — Cleaned River Measurements (2015–2017)
license: other
license_name: eea-reuse-policy
license_link: https://www.eea.europa.eu/en/legal-notice
language:
- en
tags:
- water-quality
- environmental
- eea
- wise
- eu
- hydrology
- sentinel-2
- remote-sensing
- river
- geoparquet
size_categories:
- 100K<n<1M
modality:
- geospatial
---
# EEA Waterbase Optics5 — Cleaned River Measurements (2015–2017)
Filtered and cleaned subset of the [EEA Waterbase – Water Quality ICM (WISE-4) v2018.1](https://huggingface.co/datasets/cassini-team-todo/eea-waterbase) prepared for training Sentinel-2 water quality models.
Produced during the **11th CASSINI Hackathon – EU Space for Water**.
## Filtering criteria
- **Water body category**: rivers only (`parameterWaterBodyCategory = RW`)
- **Station coordinates**: stations must have valid `lon` and `lat`
- **Date window**: 2015-07-01 – 2017-12-31 (aligned with Sentinel-2A availability)
- **Determinands (Optics5)**: five parameters with the strongest expected link to Sentinel-2 surface reflectance
| Code | Parameter | Unit |
|---|---|---|
| `EEA_3133-01-5` | Dissolved oxygen | mg{O2}/L |
| `EEA_3131-01-9` | Oxygen saturation | % |
| `CAS_14797-55-8` | Nitrate | mg{NO3}/L |
| `CAS_14798-03-9` | Ammonium | mg{NH4}/L |
| `CAS_14265-44-2` | Phosphate-P | mg{P}/L |
## Files
### `station_river_bboxes.geoparquet`
**Primary spatial file for model training.** One row per station matched to its nearest HydroRIVERS segment. The geometry is a ~5 km corridor polygon that follows the river shape — suitable as a Sentinel-2 query window with the station at the midpoint.
Construction:
1. Each station is snapped onto its nearest HydroRIVERS segment (max 5 km)
2. All connected river geometry within 2.5 km of the snap point is merged into a continuous linestring
3. A 5 km sub-segment centred on the snap point is extracted (2.5 km each direction)
4. The segment is buffered 300 m each side → corridor polygon in WGS-84
| Column | Description |
|---|---|
| `station_id` | Monitoring site identifier |
| `lat`, `lon` | Original station coordinates (WGS84) |
| `first_date` | Earliest measurement date in window |
| `last_date` | Latest measurement date in window |
| `n_measurements` | Total number of measurements |
| `n_determinands` | Number of distinct Optics5 codes measured |
| `HYRIV_ID` | Matched HydroRIVERS segment ID |
| `MAIN_RIV` | HydroRIVERS main-river ID |
| `snap_dist_m` | Distance from station to the river line (metres) |
| `geometry` | Corridor polygon (Polygon, WGS84) following the river shape |
| `bbox_west/south/east/north` | Envelope coordinates of the corridor for Sentinel Hub API calls |
**4,742 stations** (15 dropped — farther than 5 km from any river)
### `measurements_optics5_long.parquet`
Long-format table of all individual measurements. One row per (station, sampling date, determinand).
| Column | Description |
|---|---|
| `station_id` | Monitoring site identifier |
| `lat`, `lon` | Station coordinates (WGS84) |
| `sampling_date` | Date of measurement |
| `determinand_code` | One of the Optics5 codes above |
| `value` | Observed value (numeric) |
| `unit` | Unit of measurement |
| `below_loq` | Flag: value is below limit of quantification |
| `loq_value` | Limit of quantification value |
**4,757 stations · 251,245 rows · 2015-07-01 to 2017-12-31**
Determinand breakdown:
| Code | Rows | Stations |
|---|---|---|
| `EEA_3133-01-5` | 32,330 | 3,594 |
| `EEA_3131-01-9` | 33,301 | 3,163 |
| `CAS_14797-55-8` | 63,488 | 3,641 |
| `CAS_14798-03-9` | 65,025 | 3,495 |
| `CAS_14265-44-2` | 57,101 | 3,122 |
### `station_index_optics5.parquet`
One row per eligible station with summary statistics.
| Column | Description |
|---|---|
| `station_id` | Monitoring site identifier |
| `lat`, `lon` | Station coordinates (WGS84) |
| `first_date` | Earliest measurement date in window |
| `last_date` | Latest measurement date in window |
| `n_measurements` | Total number of non-null value rows |
| `n_determinands` | Number of distinct Optics5 codes measured |
**4,757 rows**
### `station_year_coverage_optics5.parquet`
Audit table: number of measurements per station, year, and determinand code.
| Column | Description |
|---|---|
| `station_id` | Monitoring site identifier |
| `year` | Calendar year |
| `determinand_code` | Optics5 determinand code |
| `n_measurements` | Number of measurements in that year |
**33,419 rows**
## Intended use
Each training sample for a Sentinel-2 model is a `(station, date)` pair joined to the nearest cloud-free Sentinel-2 scene within a ±N day window. The corridor polygon in `station_river_bboxes.geoparquet` defines the spatial extent of the Sentinel-2 query — pass `bbox_west/south/east/north` directly to the Sentinel Hub Statistical API. Separate models (or a multitask model with target masking) are trained per determinand — co-presence of all five codes on the same date is not required.
## Source
Derived from `cassini-team-todo/eea-waterbase` (mirror of EEA Waterbase WISE-4 v2018.1).
River network from [`cassini-team-todo/hydro-rivers-europe`](https://huggingface.co/datasets/cassini-team-todo/hydro-rivers-europe) (HydroRIVERS v1.0).
Original data © European Environment Agency, reused under the [EEA reuse policy](https://www.eea.europa.eu/en/legal-notice).
提供机构:
cassini-team-todo



