A Comprehensive Surface Water Quality Monitoring Dataset (1940-2023): 2.82Million Record Resource for Empirical and ML-Based Research
收藏DataCite Commons2025-06-01 更新2025-05-07 收录
下载链接:
https://figshare.com/articles/dataset/A_Comprehensive_Surface_Water_Quality_Monitoring_Dataset_1940-2023_2_82Million_Record_Resource_for_Empirical_and_ML-Based_Research/27800394/1
下载链接
链接失效反馈官方服务:
资源简介:
<b>Data Description</b><b>Water Quality Parameters</b>: Ammonia, BOD, DO, Orthophosphate, pH, Temperature, Nitrogen, Nitrate.<b>Countries/Regions</b>: United States, Canada, Ireland, England, China.<b>Years Covered</b>: 1940-2023.<b>Data Records</b>: 2.82 million.<b>Definition of Columns</b><b>Country</b>: Name of the water-body region.<b>Area</b>: Name of the area in the region.<b>Waterbody Type</b>: Type of the water-body source.<b>Date</b>: Date of the sample collection (dd-mm-yyyy).<b>Ammonia (mg/l)</b>: Ammonia concentration.<b>Biochemical Oxygen Demand (BOD) (mg/l)</b>: Oxygen demand measurement.<b>Dissolved Oxygen (DO) (mg/l)</b>: Concentration of dissolved oxygen.<b>Orthophosphate (mg/l)</b>: Orthophosphate concentration.<b>pH (pH units)</b>: pH level of water.<b>Temperature (°C)</b>: Temperature in Celsius.<b>Nitrogen (mg/l)</b>: Total nitrogen concentration.<b>Nitrate (mg/l)</b>: Nitrate concentration.<b>CCME_Values</b>: Calculated water quality index values using the CCME WQI model.<b>CCME_WQI</b>: Water Quality Index classification based on CCME_Values.<b>Data Directory Description:</b><b>Category 1: Dataset</b><b>Combined Data: </b>This folder contains two CSV files: <i>Combined_dataset.csv</i> and <i>Summary.xlsx</i>. The <i>Combined_dataset.csv</i> file includes all eight water quality parameter readings across five countries, with additional data for initial preprocessing steps like missing value handling, outlier detection, and other operations. It also contains the CCME Water Quality Index calculation for empirical analysis and ML-based research. The <i>Summary.xlsx</i> provides a brief description of the datasets, including data distributions (e.g., maximum, minimum, mean, standard deviation).<br><i>Combined_dataset.csv</i><i>Summary.xlsx</i><b>Country-wise Data: </b>This folder contains separate country-based datasets in CSV files. Each file includes the eight water quality parameters for regional analysis. The <i>Summary_country.xlsx</i> file presents country-wise dataset descriptions with data distributions (e.g., maximum, minimum, mean, standard deviation).<br><i>England_dataset.csv</i><i>Canada_dataset.csv</i><i>USA_dataset.csv</i><i>Ireland_dataset.csv</i><i>China_dataset.csv</i><i>Summary_country.xlsx</i><b>Category 2: Code</b><br>Data processing and harmonization code (e.g., Language Conversion, Date Conversion, Parameter Naming and Unit Conversion, Missing Value Handling, WQI Measurement and Classification).<br><i>Data_Processing_Harmonnization.ipynb</i>The code used for Technical Validation (e.g., assessing the Data Distribution, Outlier Detection, Water Quality Trend Analysis, and Vrifying the Application of the Dataset for the ML Models). <i>Technical_Validation.ipynb</i><b>Category 3: Data Collection Sources</b><br>This category includes links to the selected dataset sources, which were used to create the dataset and are provided for further reconstruction or data formation. It contains links to various data collection sources.<br><i>DataCollectionSources.xlsx</i><br><b>Abstract</b><br>Surface water quality monitoring is crucial for food security, public health, and ecosystem preservation. While water quality monitoring is a well-established practice, there has been limited effort to compile a comprehensive and harmonized dataset for global surface water. This study introduces an international surface water quality dataset that preserves the spatial and temporal variability, integrity, consistency, and depth of the data, enabling empirical evaluation, data-driven prediction, and forecasting of water quality. The dataset, compiled from a range of sources, including regional and global water quality databases, water management organizations, and research projects across five leading countries (USA, Canada, Ireland, England, and China), consists of 2.82 million measurements of eight water quality parameters spanning from 1940 to 2023. This dataset supports meta-analysis of water quality models and facilitates Machine Learning-based investigations into the spatial and temporal drivers and patterns of surface water quality on regional to global scales.
提供机构:
figshare
创建时间:
2025-01-30



