WaterNet Outputs and Code
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/YY2XMG
下载链接
链接失效反馈官方服务:
资源简介:
Introduction This data repository contains global raster and vector outputs for the WaterNet model at 20 meters, and it contains the code for three python modules related to the generation of this data. If using this data please cite the original WaterNet paper, where you can also find details of the modelling and data production process: "Pierson, Matthew., and Mehrabi, Zia. 2024. Mapping waterways worldwide with deep learning. arXiv. https://doi.org/10.48550/arXiv.2412.00050".
Projection All files use EPSG: 4326.
Raster Data Naming Convention The raster outputs have the following naming convention {xtile}_{ytile}.tif and correspond to a zoom level 6 xyz map tile. We have included geoparquet file named WaterNet_20m_raster_files_bbox.parquet that has the bounding boxes for each raster file. Datatypes The raster files have an unsigned 8-bit integer datatype, with 0 indicating a low probability of a waterway, and 255 indicating a high probability of waterway. During the vectorization process, we consider two sharp cut offs, with a value less than 255*0.1 to not a waterway, and a value greater than 255*0.5 to be a waterway if it is connected to a stream in TDX-Hydro (i.e if its connected component intersects a stream in TDX-Hydro). See the paper for more information. Known Issues There are known issues with the raster product. These include lower accuracy of capturing swampy areas and deserts (although alternative feature weightings of the model can allow for better capture of swamps, and our vectorization process aims to resolve the issues in deserts). We also note higher noise in areas where it is difficult to create cloud free composites (coastal areas, near the equator); and future integration of SAR data may help alleviate these particularly on a near-real time deploy. There is also artifact in Greenland (missing cells) that we expect is due to Sentinel-2 feature inputs, for which further investigation and back fill is likely required.
Vector Data Naming Convention The vector outputs have the following naming convention {hydrobasins level 2 id}_{part}.parquet. The level 2 HydroBasins can be obtained for their website. As the naming convention suggests, each file corresponds to part of a hydrobasin level 2 basin. Since many of the files were larger than 2.5 GB, they were split into parts to satisfy the Harvard Dataverse individual file size limit. Dataframe Column Descriptions stream_id Data type: int Description: A unique id for the stream segment. target_stream_id Data Type: int Description: The stream_id of the target. IE the stream_id of the next adjacent downstream stream. source_stream_ids Data Type: list(int) Description: A list of the source stream_id. IE a list containing the ids of all adjacent streams flowing into this stream. Can contain more than 2 ids. stream_order Data Type: int Description: The Strahler stream order of this stream segment. from_tdx Data Type: bool Description: True if this stream segment appears in TDX-Hydro, otherwise False. Note, this may only be a segment of a stream in TDX-Hydro. tdx_stream_id Data Type: int Description: Each stream in this dataset falls in a drainage basin in the TDX-Hydro dataset. This value corresponds to the streamID in the TDX-Hydo drainage basins dataset that this stream is in (or LINKNO in the TDX-Hydo stream network dataset). intersects_lake Data Type: bool Description: True if the geometry intersects a lake in the HydroLakes v 1.0 Dataset. length_m Data Type: Description: The length of the geometry computed using pyproj.Geod.geometry_length geometry Data Type: LineString (polyline) Description: The Geometry of this segment. Known Issues Vectorization of any large body of water (lakes, swamps, wide rivers, etc) can result in multiple geometries that should all be a single geometry. One method for identifying such artifacts are to search for streams of order 1 whose target streams are high in order. Since these artifacts often occur in lakes, we have included the column "intersects_lake" in the output, which indicates if the geometry intersects a lake in HydroLakes.
WaterNet Code The code needed to make and train WaterNet and vectorize its outputs has been supplied in WaterNet_code directly in this repository.The code for inference of WaterNet globally has not been included as the large amount of data required makes such an effort difficult to generalize across compute hardware. All of the code will also be made public on GitHub at the links below in the near future. WaterNet WaterNet Training and Evaluation. WaterNet Vectorization
Acknowledgments The authors would like to thank Bridges to Prosperity for enabling the project. This project was in part funded by Bridges to Prosperity under the grant “Remote Impact Assessment of Rural Infrastructure Development” to the Better Planet Laboratory (https://betterplanetlab.com/).
创建时间:
2024-12-04



