Bacteria Prediction Dataset, Code, and Documentation for Parkside Aquatic Park (2019–2025)
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Bacteria_Prediction_Dataset_Code_and_Documentation_for_Parkside_Aquatic_Park_2019_2025_/30749243
下载链接
链接失效反馈官方服务:
资源简介:
This dataset and analysis package support a machine-learning study on predicting fecal indicator bacteria levels at Parkside Aquatic Park in San Mateo, California. The dataset integrates publicly available environmental records from three sources:
Surface Water Fecal Indicator Bacteria Monitoring Results (California Open Data Portal)Hourly weather observations (NOAA Global Historical Climatology Network)Sanitary sewer overflow (SSO) spill reports (California State Water Resources Control Board)The compiled CSV file contains over 800 water-quality samples collected between 2019 and 2025, with engineered environmental features representing recent conditions prior to each sample. Variables include:
Result: measured E. coli or Enterococcus (MPN/100 mL)rain_1day: rainfall in the previous 24 hours (mm)rain_3day_sum: cumulative rainfall in the previous 3 days (mm)temp_3day_avg: average air temperature in the previous 3 days (°F)sso_7day_reachsurf_count: number of sewage spills in the previous 7 days that reached surface waterssso_7day_totalvolume: total volume of those spills (gallons)This Figshare item includes:
aquatic_park_final.csv — the complete cleaned datasetJupyter Notebook (analysis code) used for preprocessing, exploratory analysis, feature engineering, model training, and figure generationDocumentation (.md) containing the exact Excel formulas and data-source references used to construct the datasetPDF manuscript describing the research motivation, methods, results, and model evaluation (Logistic Regression, Random Forest, and Log-Transformed Random Forest)The goal of this project is to support environmental modeling, community water-safety research, and reproducible machine-learning studies. All files are provided for reuse under the CC BY 4.0 license.
创建时间:
2025-12-01



