five

Baltic Sea and algae blooms analysis - Galaxy.eu workflow

收藏
DataCite Commons2025-10-30 更新2026-05-04 收录
下载链接:
https://mostwiedzy.pl/en/open-research-data/baltic-sea-and-algae-blooms-analysis-galaxy-eu-workflow,10301012261026951-0
下载链接
链接失效反馈
官方服务:
资源简介:
This is a multiple regression analysis workflow designed to predict algal bloom risk in the Baltic Sea based on oceanographic and nutrient data. The workflow combines data preprocessing, statistical modeling, and spatial visualization to assess water quality at bathing sites. The workflow aims to: Predict algal bloom risk (chlorophyll levels) based on environmental factors: Nutrient concentrations (NO3, PO4, NH4) Sea surface temperature Assess water quality at bathing sites by: Mapping bloom risk zones across the Baltic Sea Classifying sites into risk categories (very low to very high) Generating spatial risk maps with interpolation Provide statistical analysis including: Multiple linear regression modeling Model diagnostics (R², RMSE, MAE) Variable significance testing (p-values) Multicollinearity assessment (VIF) Workflow Steps: 1-3: Data extraction and preprocessing Extract metadata from NetCDF files (xarray_metadata_info) Select specific variables (chl, no3, po4, nh4, sst_trend) from datasets Cut relevant columns from bathing water data 4: Data merging Merge biogeochemical variables by coordinates Add temperature data with geographic tolerance (±0.1° ≈ 11km) Handle temporal averaging for time-series data Memory-efficient chunked processing for large datasets 5: Statistical modeling (Jupyter notebook execution) Multiple regression analysis with chlorophyll as dependent variable Split data into training (80%) and test (20%) sets Standardize features for comparison Calculate regression coefficients and significance 6: Model validation R² score, RMSE, MAE metrics Residual analysis and normality tests Q-Q plots and diagnostic visualizations 7: Spatial visualization Generate 3 bloom risk maps: Actual chlorophyll levels (smooth gradient) Predicted levels from regression model (smooth gradient) Interpolated risk zones with contour lines Risk classification thresholds (adjusted to data): Very low: <1.5 μg/L Low: 1.5-2 μg/L Medium: 2-3 μg/L High: 3-5 μg/L Very high: >5 μg/L 8: Export results 14 output files including visualizations, CSV data, and statistical reports All files exported to Galaxy History for download The workflow employs: Statistical technique: Multiple linear regression (OLS) Interpolation: Cubic spline for smooth spatial gradients Geographic matching: Tolerance-based coordinate merging Quality control: VIF for multicollinearity, p-values for significance Validation: Train-test split with standardization This comprehensive workflow enables environmental monitoring and early warning for harmful algal blooms at Baltic Sea bathing sites.
提供机构:
Gdańsk University of Technology
创建时间:
2025-10-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作