A Comprehensive Surface Water Quality Monitoring Dataset (1940-2023): 2.82Million Record Resource for Empirical and ML-Based Research
收藏DataCite Commons2025-06-01 更新2025-05-07 收录
下载链接:
https://figshare.com/articles/dataset/A_Comprehensive_Surface_Water_Quality_Monitoring_Dataset_1940-2023_2_82Million_Record_Resource_for_Empirical_and_ML-Based_Research/27800394/2
下载链接
链接失效反馈官方服务:
资源简介:
<b>Data Description</b><b>Water Quality Parameters</b>: Ammonia, BOD, DO, Orthophosphate, pH, Temperature, Nitrogen, Nitrate.<b>Countries/Regions</b>: United States, Canada, Ireland, England, China.<b>Years Covered</b>: 1940-2023.<b>Data Records</b>: 2.82 million.<b>Definition of Columns</b><b>Country</b>: Name of the water-body region.<b>Area</b>: Name of the area in the region.<b>Waterbody Type</b>: Type of the water-body source.<b>Date</b>: Date of the sample collection (dd-mm-yyyy).<b>Ammonia (mg/l)</b>: Ammonia concentration.<b>Biochemical Oxygen Demand (BOD) (mg/l)</b>: Oxygen demand measurement.<b>Dissolved Oxygen (DO) (mg/l)</b>: Concentration of dissolved oxygen.<b>Orthophosphate (mg/l)</b>: Orthophosphate concentration.<b>pH (pH units)</b>: pH level of water.<b>Temperature (°C)</b>: Temperature in Celsius.<b>Nitrogen (mg/l)</b>: Total nitrogen concentration.<b>Nitrate (mg/l)</b>: Nitrate concentration.<b>CCME_Values</b>: Calculated water quality index values using the CCME WQI model.<b>CCME_WQI</b>: Water Quality Index classification based on CCME_Values.<b>Data Directory Description:</b><b>Category 1: Dataset</b><b>Combined Data: </b>This folder contains two CSV files: <i>Combined_dataset.csv</i> and <i>Summary.xlsx</i>. The <i>Combined_dataset.csv</i> file includes all eight water quality parameter readings across five countries, with additional data for initial preprocessing steps like missing value handling, outlier detection, and other operations. It also contains the CCME Water Quality Index calculation for empirical analysis and ML-based research. The <i>Summary.xlsx</i> provides a brief description of the datasets, including data distributions (e.g., maximum, minimum, mean, standard deviation).<br><i>Combined_dataset.csv</i><i>Summary.xlsx</i><b>Country-wise Data: </b>This folder contains separate country-based datasets in CSV files. Each file includes the eight water quality parameters for regional analysis. The <i>Summary_country.xlsx</i> file presents country-wise dataset descriptions with data distributions (e.g., maximum, minimum, mean, standard deviation).<br><i>England_dataset.csv</i><i>Canada_dataset.csv</i><i>USA_dataset.csv</i><i>Ireland_dataset.csv</i><i>China_dataset.csv</i><i>Summary_country.xlsx</i><b>Category 2: Code</b><br>Data processing and harmonization code (e.g., Language Conversion, Date Conversion, Parameter Naming and Unit Conversion, Missing Value Handling, WQI Measurement and Classification).<br><i>Data_Processing_Harmonnization.ipynb</i>The code used for Technical Validation (e.g., assessing the Data Distribution, Outlier Detection, Water Quality Trend Analysis, and Vrifying the Application of the Dataset for the ML Models).<i>Technical_Validation.ipynb</i><b>Category 3: Data Collection Sources</b><br>This category includes links to the selected dataset sources, which were used to create the dataset and are provided for further reconstruction or data formation. It contains links to various data collection sources.<br><i>DataCollectionSources.xlsx</i><b>Original Paper Title: </b>A Comprehensive Dataset of Surface Water Quality Spanning 1940-2023 for Empirical and ML Adopted Research<b>Abstract</b><br>Assessment and monitoring of surface water quality are essential for food security, public health, and ecosystem protection. Although water quality monitoring is a known phenomenon, little effort has been made to offer a comprehensive and harmonized dataset for surface water at the global scale. This study presents a comprehensive surface water quality dataset that preserves spatio-temporal variability, integrity, consistency, and depth of the data to facilitate empirical and data-driven evaluation, prediction, and forecasting. The dataset is assembled from a range of sources, including regional and global water quality databases, water management organizations, and individual research projects from five prominent countries in the world, e.g., the USA, Canada, Ireland, England, and China. The resulting dataset consists of 2.82 million measurements of eight water quality parameters that span 1940 - 2023. This dataset can support meta-analysis of water quality models and can facilitate Machine Learning (ML) based data and model-driven investigation of the spatial and temporal drivers and patterns of surface water quality at a cross-regional to global scale.<br><b>Note:</b> Cite this repository and the original paper when using this dataset.<br><br>
提供机构:
figshare
创建时间:
2025-02-23



