Ascii grids of predicted pH in depth zones used by domestic and public drinking water supply depths, Central Valley, California

DataONE2017-04-12 更新2024-06-26 收录

下载链接：

https://search.dataone.org/view/95212e65-d952-4753-85a3-e1e6f6df872f

下载链接

链接失效反馈

官方服务：

资源简介：

The ascii grids associated with this data release are predicted distributions of continuous pH at the drinking water depth zones in the groundwater of Central Valley, California. The two prediction grids produced in this work represent predicted pH at the domestic supply and public supply drinking water depths, respectively and are bound by the alluvial boundary that defines the Central Valley. A depth of 46 m was used to stratify wells into the shallow and deep aquifer and were derived from depth percentiles associated with domestic and public supply in previous work by Burow et al. (2013). In this work, the median well depth categorized as domestic supply was 30 meters below land surface and the median well depth categorized as public supply is 100 meters below land surface. Prediction grids were created using prediction modeling methods, specifically Boosted Regression Trees (BRT) with a gaussian error distribution within a statistical learning framework within R's computing framework (http://www.r-project.org/). The statistical learning framework seeks to maximize the predictive performance of machine learning methods through model tuning by cross validation. The response variable was measured pH from 1337 wells, and was compiled from two sources: US Geological Survey (USGS) National Water Information System (NWIS) Database (all data are publicly available from the USGS: http://waterdata.usgs.gov/ca/nwis/nwis) and the California State Water Resources Control Board Division of Drinking Water (SWRCB-DDW) database (water quality data are publicly available from the SWRCB: http://www.waterboards.ca.gov/gama/geotracker_gama.shtml). Only wells with measured pH and well depth data were selected, and for wells with multiple records, only the most recent sample in the period 1993-2014 was used. A total of 1003 wells (training dataset) were used to train the BRT model and 334 wells (hold-out dataset) were used to validate the prediction model. The training r-squared was 0.70 and the RMSE in standard pH units was were 0.26. The holdout r-squared was 0.43 and RMSE in standard pH units was 0.37. Predictor variables consisting of more than 60 variables from 7 sources (see metadata) were assembled to develop a model that incorporates regional-scale soil properties, soil chemistry, land use, aquifer textures, and aquifer hydrology. Previously developed Central Valley model outputs of textures (Central Valley Textural Model, CVTM; Faunt et al. 2010) and MODFLOW-simulated vertical water fluxes and predicted depth to water table (Central Valley Hydrologic Model, CVHM; Faunt, 2009) were used to represent aquifer textures and groundwater hydraulics, respectively. In this work, wells were attributed to predictor variable values in ArcGIS using a 500-m buffer. Results of the predictor variable influence as defined by Friedman (2001) for variables used in the final BRT model used for mapping can be downloaded from this landing page (see file named PredictorVariableInfluence_CentralValley_pH_BRT.csv).

本数据发布配套的ASCII格网（ASCII grids），为美国加利福尼亚州中央谷地地下水饮用水深度带的连续pH值预测分布。本研究生成的两张预测格网，分别对应生活供水与公共供水饮用水深度下的pH预测值，其空间范围被界定中央谷地的冲积边界所限定。本研究采用46米作为分界深度，将水井划分为浅层与深层含水层，该分界深度源自Burow等人（2013）过往研究中与生活供水、公共供水相关的深度百分位数。本研究中，归类为生活供水的水井中位埋深为地表下30米，归类为公共供水的水井中位埋深为地表下100米。本研究通过预测建模方法生成预测格网，具体为在R语言计算框架（http://www.r-project.org/）的统计学习框架下，采用带有高斯误差分布的提升回归树（Boosted Regression Trees, BRT）模型。该统计学习框架通过交叉验证进行模型调优，以最大化机器学习方法的预测性能。本研究的响应变量为1337口水井的实测pH值，数据源自两个公开数据源：美国地质调查局（US Geological Survey, USGS）国家水信息系统（National Water Information System, NWIS）数据库（所有数据可从USGS公开获取：http://waterdata.usgs.gov/ca/nwis/nwis），以及加州州水资源控制委员会饮用水分部（California State Water Resources Control Board Division of Drinking Water, SWRCB-DDW）数据库（水质数据可从SWRCB公开获取：http://www.waterboards.ca.gov/gama/geotracker_gama.shtml）。本研究仅选取带有实测pH值与水井埋深数据的水井；对于存在多条记录的水井，仅采用1993-2014年间的最新采样数据。共计1003口水井作为训练数据集（training dataset）用于训练BRT模型，另有334口水井作为预留验证数据集（hold-out dataset）用于验证该预测模型。训练集的决定系数（r-squared）为0.70，以标准pH单位计的均方根误差（Root Mean Square Error, RMSE）为0.26。预留验证集的决定系数为0.43，以标准pH单位计的均方根误差为0.37。本研究共整合来自7个数据源的60余个预测变量（详见元数据），以构建涵盖区域尺度土壤属性、土壤化学、土地利用、含水层岩性以及含水层水文的预测模型。本研究采用既往开发的中央谷地模型输出结果作为输入数据：其中中央谷地岩性模型（Central Valley Textural Model, CVTM; Faunt等人，2010）的输出用于表征含水层岩性，MODFLOW模拟的垂直水通量与预测地下水位埋深数据（源自中央谷地水文模型Central Valley Hydrologic Model, CVHM; Faunt, 2009）则用于表征地下水水动力条件。本研究通过ArcGIS软件的500米缓冲区分析，为各水井赋予对应的预测变量值。本研究最终用于制图的BRT模型中各预测变量的重要性结果（按Friedman（2001）提出的方法定义）可从本登陆页面下载，对应文件名为PredictorVariableInfluence_CentralValley_pH_BRT.csv。

创建时间：

2017-04-13