cassini-team-todo/eu-hydro-master-skeleton
收藏Hugging Face2026-04-24 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/cassini-team-todo/eu-hydro-master-skeleton
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
pretty_name: EU-Hydro Master Skeleton (curated GeoParquet)
tags:
- geospatial
- hydrology
- europe
- geoparquet
- copernicus
---
# EU-Hydro Master Skeleton
Per-basin GeoParquet shards derived from the Copernicus **EU-Hydro v1.3** GeoPackages. Four layers are published — river centerlines, river-surface polygons, inland-water polygons (lakes + wide waters), and river-basin polygons — all reprojected to a common CRS and stripped of admin-only columns for easier querying.
## Contents
```
eu_hydro_master_skeleton_geoparquet/
├── river_lines/ # River_Net_l MultiLineString ~1.3 M features
├── river_polygons/ # River_Net_p MultiPolygon ~12 k features
├── inland_water/ # InlandWater MultiPolygon ~380 k features
├── river_basins/ # RiverBasins MultiPolygon ~100 features
└── manifest.csv # layer, file, source_basin, features
merge_euhydro.py # reproducer (extract zip → curate → GeoParquet)
```
Each subdirectory holds one shard per basin: `euhydro_<basin>_v013.geoparquet`. Shards are zstd-compressed and carry a `source_basin` column so you can concat them without losing provenance.
### Layer quick reference
| Directory | Source layer | Geometry | Use case |
|---|---|---|---|
| `river_lines/` | `River_Net_l` | MultiLineString | Network topology, Strahler order, routing |
| `river_polygons/` | `River_Net_p` | MultiPolygon | Actual water-surface area of wide rivers — useful for satellite-image overlap (Sentinel-2 NDWI etc.) |
| `inland_water/` | `InlandWater` | MultiPolygon | Lakes, reservoirs, and any water body modelled as an area |
| `river_basins/` | `RiverBasins` | MultiPolygon | Catchment polygons for aggregating per-basin stats |
### Common properties
- **CRS**: `EPSG:3035` (ETRS89 / LAEA Europe) — units are metres, safe for area/length maths without reprojection.
- **Dimensions**: 2D. Original Z/M values from EU-Hydro were dropped.
- **Columns**: admin-only fields (`BEGLIFEVER`, `ENDLIFEVER`, `UPDAT_BY`, `UPDAT_WHEN`) were removed. All other source columns are preserved.
- **Excluded basins**: `fr_guiana`, `fr_islands`, `iceland` — non-continental or overseas.
One note: the `hondo` basin has no `river_polygons/` shard. That's expected — the source GPKG's `River_Net_p` layer is empty for this basin (no rivers wide enough to be captured as an area).
## Quick start
```python
import geopandas as gpd
from huggingface_hub import hf_hub_download
path = hf_hub_download(
"cassini-team-todo/eu-hydro-master-skeleton",
"eu_hydro_master_skeleton_geoparquet/river_polygons/euhydro_shannon_v013.geoparquet",
repo_type="dataset",
)
shannon_polys = gpd.read_parquet(path)
print(shannon_polys.crs, shannon_polys.geom_type.unique(), len(shannon_polys))
```
Loading a whole layer across all basins:
```python
from pathlib import Path
import geopandas as gpd, pandas as pd
from huggingface_hub import snapshot_download
local_root = Path(snapshot_download(
"cassini-team-todo/eu-hydro-master-skeleton",
repo_type="dataset",
allow_patterns="eu_hydro_master_skeleton_geoparquet/inland_water/*.geoparquet",
))
shards = sorted((local_root / "eu_hydro_master_skeleton_geoparquet" / "inland_water").glob("*.geoparquet"))
inland_eu = pd.concat([gpd.read_parquet(p) for p in shards], ignore_index=True)
```
## Reproducing
The source GeoPackages aren't redistributable through this repo — download them from the Copernicus Land portal (EU-Hydro v1.3, per-basin GPKG zips). Place all `euhydro_*_v013_GPKG.zip` files in the same directory as `merge_euhydro.py` and run:
```bash
python merge_euhydro.py
```
This will extract each zip to a temp dir, read the four target layers with `force_2d=True`, drop admin columns, reproject any basin whose CRS differs from the first valid one, and write the output tree next to the script. A `manifest.csv` summarizing every shard is written at the end.
## Source and licensing
- **Source**: Copernicus EU-Hydro v1.3 — https://land.copernicus.eu/en/products/eu-hydro
- **License**: Copernicus data is free and open (attribution required). This derived dataset is released under CC-BY-4.0; please cite Copernicus EU-Hydro in any downstream use.
授权协议:CC-BY-4.0
美观名称:欧盟水文主骨架(精选地理Parquet (GeoParquet) 格式)
标签:
- 地理空间
- 水文学
- 欧洲
- 地理Parquet (GeoParquet)
- 哥白尼 (Copernicus)
# 欧盟水文主骨架数据集
本数据集为基于哥白尼 (Copernicus) **EU-Hydro v1.3** 数据集的地理包 (GeoPackage) 文件构建的分流域地理Parquet (GeoParquet) 分片。共发布四类图层:河流中心线、河流水面多边形、内陆水域多边形(湖泊与宽阔水体)以及流域多边形;所有图层均统一重投影至公共坐标系,并移除了仅用于行政管理的字段以简化查询。
## 数据集内容
eu_hydro_master_skeleton_geoparquet/
├── river_lines/ # 河流中心线(源图层:River_Net_l,几何类型:MultiLineString,约130万个要素)
├── river_polygons/ # 河流面要素(源图层:River_Net_p,几何类型:MultiPolygon,约1.2万个要素)
├── inland_water/ # 内陆水域(源图层:InlandWater,几何类型:MultiPolygon,约38万个要素)
├── river_basins/ # 流域多边形(源图层:RiverBasins,几何类型:MultiPolygon,约100个要素)
└── manifest.csv # 图层、文件、源流域、要素数量清单
merge_euhydro.py # 数据集复现脚本(解压源文件→整理格式→导出为地理Parquet (GeoParquet))
每个子目录均存储对应单个流域的分片文件,命名格式为`euhydro_<流域ID>_v013.geoparquet`。所有分片均采用zstd压缩,并包含`source_basin`字段,可在合并分片时保留数据溯源信息。
### 图层快速参考
| 目录名 | 源图层名 | 几何类型 | 适用场景 |
|---|---|---|---|
| `river_lines/` | `River_Net_l` | MultiLineString | 网络拓扑分析、斯特拉勒河级划分、河道演算 |
| `river_polygons/` | `River_Net_p` | MultiPolygon | 宽阔河流实际水面面积计算,适用于卫星影像叠加分析(如Sentinel-2的NDWI指数等) |
| `inland_water/` | `InlandWater` | MultiPolygon | 湖泊、水库及所有以面状建模的水体 |
| `river_basins/` | `RiverBasins` | MultiPolygon | 用于按流域聚合统计数据的汇水区多边形 |
### 通用属性
- **坐标系 (CRS)**:`EPSG:3035`(ETRS89 / 欧洲Lambert方位等面积投影),单位为米,无需额外重投影即可直接进行面积、长度计算。
- **维度**:2维。已移除EU-Hydro源数据中的Z、M维度值。
- **字段**:已删除仅用于行政管理的字段(`BEGLIFEVER`、`ENDLIFEVER`、`UPDAT_BY`、`UPDAT_WHEN`),其余源字段均保留。
- **排除流域**:`fr_guiana`、`fr_islands`、`iceland`——均为非欧洲本土或海外领地流域。
> 注意:`hondo`流域无`river_polygons/`目录下的分片文件,此为正常现象——该流域的源地理包 (GeoPackage) 文件中`River_Net_p`图层为空(无宽度达标可建模为面状的河流)。
## 快速上手
python
import geopandas as gpd
from huggingface_hub import hf_hub_download
# 下载指定流域的河流面要素分片
path = hf_hub_download(
"cassini-team-todo/eu-hydro-master-skeleton",
"eu_hydro_master_skeleton_geoparquet/river_polygons/euhydro_shannon_v013.geoparquet",
repo_type="dataset",
)
shannon_polys = gpd.read_parquet(path)
# 打印坐标系、几何类型及要素数量
print(shannon_polys.crs, shannon_polys.geom_type.unique(), len(shannon_polys))
加载全流域全图层的示例:
python
from pathlib import Path
import geopandas as gpd, pandas as pd
from huggingface_hub import snapshot_download
# 下载所有内陆水域分片
local_root = Path(snapshot_download(
"cassini-team-todo/eu-hydro-master-skeleton",
repo_type="dataset",
allow_patterns="eu_hydro_master_skeleton_geoparquet/inland_water/*.geoparquet",
))
# 读取并合并所有分片
shards = sorted((local_root / "eu_hydro_master_skeleton_geoparquet" / "inland_water").glob("*.geoparquet"))
inland_eu = pd.concat([gpd.read_parquet(p) for p in shards], ignore_index=True)
## 数据集复现
本数据集未在仓库中重分发源地理包 (GeoPackage) 文件,请从哥白尼陆地服务门户下载EU-Hydro v1.3的分流域地理包 (GeoPackage) 压缩包。将所有`euhydro_*_v013_GPKG.zip`文件放置于`merge_euhydro.py`脚本所在目录,执行以下命令即可复现本数据集:
bash
python merge_euhydro.py
该脚本会将每个压缩包解压至临时目录,使用`force_2d=True`参数读取四类目标图层,删除行政管理字段,对坐标系与首个有效图层不一致的流域进行重投影,并将整理后的文件树输出至脚本所在目录。最终会生成`manifest.csv`文件,汇总所有分片的相关信息。
## 数据源与授权
- **数据源**:哥白尼 (Copernicus) EU-Hydro v1.3 数据集,访问地址:https://land.copernicus.eu/en/products/eu-hydro
- **授权协议**:哥白尼数据为免费开源数据(需注明来源)。本衍生数据集采用CC-BY-4.0协议发布,在下游使用中请引用哥白尼欧盟水文数据集。
提供机构:
cassini-team-todo



