Algae bloom dataset
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/5jb9ffpmvr
下载链接
链接失效反馈官方服务:
资源简介:
The dataset used in this study comprises remotely sensed satellite imagery and derived tabular features stored in CSV format, designed to support scalable prediction of Harmful Algal Bloom (HAB) severity across freshwater bodies in the continental United States. All data sources are publicly available, ensuring transparency, reproducibility, and broad applicability.
1. Image Dataset
The image component of the dataset consists of multispectral satellite imagery acquired from the Sentinel-2 mission, provided by the European Space Agency (ESA). Sentinel-2 offers high spatial resolution imagery suitable for inland water quality assessment.
Satellite platform: Sentinel-2A and Sentinel-2B
Spatial resolution: 10 m, 20 m, and 60 m (band-dependent; 10 m bands primarily used)
Spectral bands used:
Blue (B2)
Green (B3)
Red (B4)
Near-Infrared (B8)
Red-edge bands (B5, B6, B7)
Coverage: Freshwater bodies across the continental United States
Temporal characteristics: Cloud-filtered scenes corresponding to HAB observation periods
These images capture surface reflectance characteristics of water bodies, which are critical indicators of chlorophyll concentration, turbidity, and algal biomass associated with HAB events.
The CSV file serves as the integrated feature repository, combining spectral, topographical, and derived environmental attributes for machine learning modeling.
Each row in the CSV corresponds to a unique water body observation, with columns representing:
Spectral features:
Band reflectance values
Vegetation and water indices (e.g., NDVI-based and water-specific indices)
Terrain features:
Elevation, slope, and watershed characteristics
Geospatial metadata:
Latitude and longitude
Water body identifiers
Target variable:
Categorical HAB severity level (e.g., low, moderate, high)
The CSV dataset enables efficient training of ensemble gradient boosting models without the need for computationally intensive image-based deep learning pipelines.
4. Data Preprocessing and Integration
Satellite images were preprocessed to remove cloud contamination and normalized to surface reflectance values. Spatial joining techniques were employed to align spectral information with corresponding terrain features derived from the DEM. The final structured dataset was exported in CSV format to ensure model portability and scalability across different computational environments.
创建时间:
2026-02-04



