five

cassini-team-todo/eea-waterbase-cleaned

收藏
Hugging Face2026-04-25 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/cassini-team-todo/eea-waterbase-cleaned
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: EEA Waterbase Optics5 — Cleaned River Measurements (2015–2017) license: other license_name: eea-reuse-policy license_link: https://www.eea.europa.eu/en/legal-notice language: - en tags: - water-quality - environmental - eea - wise - eu - hydrology - sentinel-2 - remote-sensing - river - geoparquet size_categories: - 100K<n<1M modality: - geospatial --- # EEA Waterbase Optics5 — Cleaned River Measurements (2015–2017) Filtered and cleaned subset of the [EEA Waterbase – Water Quality ICM (WISE-4) v2018.1](https://huggingface.co/datasets/cassini-team-todo/eea-waterbase) prepared for training Sentinel-2 water quality models. Produced during the **11th CASSINI Hackathon – EU Space for Water**. ## Filtering criteria - **Water body category**: rivers only (`parameterWaterBodyCategory = RW`) - **Station coordinates**: stations must have valid `lon` and `lat` - **Date window**: 2015-07-01 – 2017-12-31 (aligned with Sentinel-2A availability) - **Determinands (Optics5)**: five parameters with the strongest expected link to Sentinel-2 surface reflectance | Code | Parameter | Unit | |---|---|---| | `EEA_3133-01-5` | Dissolved oxygen | mg{O2}/L | | `EEA_3131-01-9` | Oxygen saturation | % | | `CAS_14797-55-8` | Nitrate | mg{NO3}/L | | `CAS_14798-03-9` | Ammonium | mg{NH4}/L | | `CAS_14265-44-2` | Phosphate-P | mg{P}/L | ## Files ### `station_river_bboxes.geoparquet` **Primary spatial file for model training.** One row per station matched to its nearest HydroRIVERS segment. The geometry is a ~5 km corridor polygon that follows the river shape — suitable as a Sentinel-2 query window with the station at the midpoint. Construction: 1. Each station is snapped onto its nearest HydroRIVERS segment (max 5 km) 2. All connected river geometry within 2.5 km of the snap point is merged into a continuous linestring 3. A 5 km sub-segment centred on the snap point is extracted (2.5 km each direction) 4. The segment is buffered 300 m each side → corridor polygon in WGS-84 | Column | Description | |---|---| | `station_id` | Monitoring site identifier | | `lat`, `lon` | Original station coordinates (WGS84) | | `first_date` | Earliest measurement date in window | | `last_date` | Latest measurement date in window | | `n_measurements` | Total number of measurements | | `n_determinands` | Number of distinct Optics5 codes measured | | `HYRIV_ID` | Matched HydroRIVERS segment ID | | `MAIN_RIV` | HydroRIVERS main-river ID | | `snap_dist_m` | Distance from station to the river line (metres) | | `geometry` | Corridor polygon (Polygon, WGS84) following the river shape | | `bbox_west/south/east/north` | Envelope coordinates of the corridor for Sentinel Hub API calls | **4,742 stations** (15 dropped — farther than 5 km from any river) ### `measurements_optics5_long.parquet` Long-format table of all individual measurements. One row per (station, sampling date, determinand). | Column | Description | |---|---| | `station_id` | Monitoring site identifier | | `lat`, `lon` | Station coordinates (WGS84) | | `sampling_date` | Date of measurement | | `determinand_code` | One of the Optics5 codes above | | `value` | Observed value (numeric) | | `unit` | Unit of measurement | | `below_loq` | Flag: value is below limit of quantification | | `loq_value` | Limit of quantification value | **4,757 stations · 251,245 rows · 2015-07-01 to 2017-12-31** Determinand breakdown: | Code | Rows | Stations | |---|---|---| | `EEA_3133-01-5` | 32,330 | 3,594 | | `EEA_3131-01-9` | 33,301 | 3,163 | | `CAS_14797-55-8` | 63,488 | 3,641 | | `CAS_14798-03-9` | 65,025 | 3,495 | | `CAS_14265-44-2` | 57,101 | 3,122 | ### `station_index_optics5.parquet` One row per eligible station with summary statistics. | Column | Description | |---|---| | `station_id` | Monitoring site identifier | | `lat`, `lon` | Station coordinates (WGS84) | | `first_date` | Earliest measurement date in window | | `last_date` | Latest measurement date in window | | `n_measurements` | Total number of non-null value rows | | `n_determinands` | Number of distinct Optics5 codes measured | **4,757 rows** ### `station_year_coverage_optics5.parquet` Audit table: number of measurements per station, year, and determinand code. | Column | Description | |---|---| | `station_id` | Monitoring site identifier | | `year` | Calendar year | | `determinand_code` | Optics5 determinand code | | `n_measurements` | Number of measurements in that year | **33,419 rows** ## Intended use Each training sample for a Sentinel-2 model is a `(station, date)` pair joined to the nearest cloud-free Sentinel-2 scene within a ±N day window. The corridor polygon in `station_river_bboxes.geoparquet` defines the spatial extent of the Sentinel-2 query — pass `bbox_west/south/east/north` directly to the Sentinel Hub Statistical API. Separate models (or a multitask model with target masking) are trained per determinand — co-presence of all five codes on the same date is not required. ## Source Derived from `cassini-team-todo/eea-waterbase` (mirror of EEA Waterbase WISE-4 v2018.1). River network from [`cassini-team-todo/hydro-rivers-europe`](https://huggingface.co/datasets/cassini-team-todo/hydro-rivers-europe) (HydroRIVERS v1.0). Original data © European Environment Agency, reused under the [EEA reuse policy](https://www.eea.europa.eu/en/legal-notice).
提供机构:
cassini-team-todo
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作