Baltic Sea and algae blooms analysis - Galaxy.eu workflow
收藏DataCite Commons2025-10-30 更新2026-05-04 收录
下载链接:
https://mostwiedzy.pl/en/open-research-data/baltic-sea-and-algae-blooms-analysis-galaxy-eu-workflow,10301012261026951-0
下载链接
链接失效反馈官方服务:
资源简介:
This is a multiple regression analysis workflow designed to predict algal bloom risk in the Baltic Sea based on oceanographic and nutrient data. The workflow combines data preprocessing, statistical modeling, and spatial visualization to assess water quality at bathing sites.
The workflow aims to:
Predict algal bloom risk (chlorophyll levels) based on environmental factors:
Nutrient concentrations (NO3, PO4, NH4)
Sea surface temperature
Assess water quality at bathing sites by:
Mapping bloom risk zones across the Baltic Sea
Classifying sites into risk categories (very low to very high)
Generating spatial risk maps with interpolation
Provide statistical analysis including:
Multiple linear regression modeling
Model diagnostics (R², RMSE, MAE)
Variable significance testing (p-values)
Multicollinearity assessment (VIF)
Workflow Steps:
1-3: Data extraction and preprocessing
Extract metadata from NetCDF files (xarray_metadata_info)
Select specific variables (chl, no3, po4, nh4, sst_trend) from datasets
Cut relevant columns from bathing water data
4: Data merging
Merge biogeochemical variables by coordinates
Add temperature data with geographic tolerance (±0.1° ≈ 11km)
Handle temporal averaging for time-series data
Memory-efficient chunked processing for large datasets
5: Statistical modeling (Jupyter notebook execution)
Multiple regression analysis with chlorophyll as dependent variable
Split data into training (80%) and test (20%) sets
Standardize features for comparison
Calculate regression coefficients and significance
6: Model validation
R² score, RMSE, MAE metrics
Residual analysis and normality tests
Q-Q plots and diagnostic visualizations
7: Spatial visualization
Generate 3 bloom risk maps:
Actual chlorophyll levels (smooth gradient)
Predicted levels from regression model (smooth gradient)
Interpolated risk zones with contour lines
Risk classification thresholds (adjusted to data):
Very low: <1.5 μg/L
Low: 1.5-2 μg/L
Medium: 2-3 μg/L
High: 3-5 μg/L
Very high: >5 μg/L
8: Export results
14 output files including visualizations, CSV data, and statistical reports
All files exported to Galaxy History for download
The workflow employs:
Statistical technique: Multiple linear regression (OLS)
Interpolation: Cubic spline for smooth spatial gradients
Geographic matching: Tolerance-based coordinate merging
Quality control: VIF for multicollinearity, p-values for significance
Validation: Train-test split with standardization
This comprehensive workflow enables environmental monitoring and early warning for harmful algal blooms at Baltic Sea bathing sites.
提供机构:
Gdańsk University of Technology
创建时间:
2025-10-30



