Monthly surface salinity maps of the Mekong Delta (2015--2025) at 30 m resolution
收藏DataCite Commons2026-05-03 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20003854
下载链接
链接失效反馈官方服务:
资源简介:
Description
This dataset contains monthly surface salinity maps (in practical salinity units, PSU) at 30 m spatial resolution for the Mekong Delta estuary, spanning 2015--2025. The maps are produced from a physics-aware Random Forest model trained on Harmonized Landsat Sentinel-2 (HLS v2) optical reflectance, Sentinel-1 SAR backscatter, hydrological forcing (discharge and tidal statistics), topographic covariates, and discharge--position interaction terms. A square-root target transform with CROCO hydrodynamic model boundary regularization is used to improve spatial transferability. A SAR+forcing+position fallback model fills cloud-masked optical gaps, ensuring spatially complete monthly coverage.
The mapped domain covers the tide-influenced distributary network of the Vietnamese Mekong Delta, including the Tien, Hau, Co Chien, Ham Luong, and Cua Dai branches. Dry-season months (November--June) are the primary validated domain; wet-season months (July--October) are extrapolative technical extensions.
Data sources
The publicly available source data used in this study are listed below:
HLS v2 surface reflectance (Landsat + Sentinel-2) — NASA/USGS: [https://lpdaac.usgs.gov/products/hlsv202/](https://lpdaac.usgs.gov/products/hlsv202/)
Sentinel-1 GRD IW backscatter — ESA/Copernicus: [https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S1_GRD](https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S1_GRD)
JRC Global Surface Water — EC JRC: [https://global-surface-water.appspot.com/](https://global-surface-water.appspot.com/)
MERIT Hydro — Yamazaki et al.: [http://hydro.iis.u-tokyo.ac.jp/~yamadai/MERIT_Hydro/](http://hydro.iis.u-tokyo.ac.jp/~yamadai/MERIT_Hydro/)
DeltaDTM — Pronk et al.: [https://doi.org/10.1038/s41597-024-03135-8](https://doi.org/10.1038/s41597-024-03135-8)
FES2022 tidal model — LEGOS/Noveltis: [https://www.aviso.altimetry.fr/en/data/products/auxiliary-products/global-tide-fes.html](https://www.aviso.altimetry.fr/en/data/products/auxiliary-products/global-tide-fes.html)
CROCO hydrodynamic model — IRD/MIOS, forcing fields from Marchesiello et al. (2019)
Mekong discharge gauges — MRC/National records at Kratie, Tan Chau, Chau Doc
File inventory
1) Yearly salinity maps (GeoTIFF)
Eleven files (one per year, 2015--2025), each containing 12 monthly bands:
mekong_salinity_12months_waterbody_YYYY.tif
Property
Value
**Format**
GeoTIFF (LZW-compressed, 32-bit float)
**Coordinate system**
EPSG:4326 (WGS 84)
**Spatial resolution**
0.00027 degrees (~30 m)
**Bands**
12 (January--December, band index = calendar month)
**NoData**
NaN
**Pixel values**
Salinity in PSU, range 0--35
Bands 1--6 (Jan--Jun) and 11--12 (Nov--Dec) correspond to the dry season, the primary validated domain. Bands 7--10 (Jul--Oct) are wet-season extrapolations.
2) Data-quality flags (GeoTIFF)
Eleven companion files providing per-pixel prediction flags:
mekong_salinity_12months_waterbody_YYYY_flags.tif
Flag value
Meaning
0
Full model prediction (optical + SAR + forcing available)
1
Fallback model prediction (optical predictors unavailable; SAR + forcing + position only)
2
No prediction (all predictors missing or masked)
3) Per-year metadata (JSON)
Eleven metadata files containing per-year processing provenance:
mekong_salinity_12months_waterbody_YYYY.metadata.json
Fields include: year, raster dimensions, number of valid predicted pixels, finite-pixel fraction, source tile identifier, and cropping bounds.
4) Manifest file
Consolidated metadata across all processed years:
mekong_salinity_12months_waterbody_manifest.json
5) Station metadata
Field observation station locations used for model training and validation:
Mekong_Station_Metadata.csv
Column
Description
`station_id`
Station name
`river_branch`
Distributary branch
`longitude_degE`
Longitude (EPSG:4326)
`latitude_degN`
Latitude (EPSG:4326)
`notes`
Additional information
6) Saved trained model
A serialized Random Forest regressor trained on the complete 2015--2025 dataset:
mekong_salinity_rf_v1.joblib
mekong_salinity_rf_v1.json
Property
Value
**Format**
joblib (Python 3, scikit-learn)
**Size**
~4.6 MB
**Target transform**
sqrt(salinity)
The saved model can be loaded with `joblib.load()` and used to predict salinity for new years from satellite features and hydrological forcing, without requiring access to the raw in-situ observation data. See `scripts/predict_new_data.py` for usage examples.
7) CROCO hydrodynamic cache
Pre-computed CROCO 3D surface salinity for 2016, used to generate pseudo-observation boundary anchors:
croco_2016_surface_salt_cache.npz
Property
Value
**Format**
NumPy .npz archive
**Size**
~0.1 MB
**Content**
Gridded surface salinity for March--September 2016
8) Processing scripts
The `scripts/` folder contains the complete Python pipeline for reproducing the analysis or adapting it to other estuarine systems:
Script
Description
`gee_batched_monthly_pipeline.py`
Main GEE pooled training, LOSO, and map export launcher
`gee_export_all_12_months.py`
Full 12-month yearly map export launcher
`gee_fast_monthly_loso.py`
Feature builder and station sampling logic
`gee_forcing_lookup.py`
Monthly discharge and tidal forcing lookup
`gee_config.py`
Central configuration (study area, model parameters, asset paths)
`local_add_forcing_features.py`
Merge forcing values into matched GEE training tables
`local_validate_with_forcing.py`
LOSO/LOYO/temporal validation, bootstrap CIs, SHAP, permutation importance
`local_prepare_zenodo_clean_exports.py`
Merge, clip, compress, and package yearly map exports
`local_prepare_waterbody_mask.py`
Prepare waterbody mask for GEE and local packaging
`local_station_waterbody_qc.py`
Audit station locations against waterbody geometry
`_generate_croco_pseudo_obs.py`
Generate CROCO temporal, ocean, freshwater, and estuarine boundary anchors
`train_final_model.py`
Train and serialize the production Random Forest model
`predict_new_data.py`
Predict from new feature data (CSV or raster) using a saved model
All scripts require a Google Earth Engine account and the `earthengine` Python API. The GEE project ID in `gee_config.py` should be replaced with the user's own project ID when adapting the workflow.
License
This dataset is distributed under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. Please cite this dataset in any publications or derivative works.
Contact
For questions, please contact the corresponding author listed in the associated manuscript.
提供机构:
Zenodo
创建时间:
2026-05-03



