Dataset of runoff variation and its driving mechanisms in headwater catchments of the Yellow River water conservation area (2000-2020)
收藏DataCite Commons2026-04-24 更新2026-05-05 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=81974e67d13b482f9bc47f421a01d86b
下载链接
链接失效反馈官方服务:
资源简介:
This dataset covers the runoff situation changes and key driving factor contribution assessment data of 31 typical source small watersheds in the Yellow River water conservation area from 2000 to 2020. The region spans six provinces and regions including Qinghai, Sichuan, Gansu, Ningxia, Shaanxi, and Henan, with 31 selected small watersheds ranging in area from 202 km ² to 20930 km ². In the early stage of data processing, the research team conducted systematic quality control on the daily runoff raw data of 31 hydrological stations, and used linear interpolation method to fill in a small amount of missing data to ensure the consistency and reliability of the data sequence. Meteorological and underlying surface characteristic factor data are obtained based on spatial interpolation and spatial averaging within the watershed. In terms of processing methods, the runoff characteristics of this dataset are mainly based on daily flow data to calculate three types of runoff situations: annual average flow, high flow, and low flow. Theil Sen robust non parametric statistical method and Mann Kendall test are used to calculate the trend and significance of time series changes (p<0.05). The contribution of driving factors is obtained by constructing a data-driven framework: using XGBoost (eXtreme Gradient Boosting) machine learning regression model with Optuna automatic hyperparameter optimization for runoff simulation, coupled with SHAP (Shapley Additive Explanations) interpretability analysis method based on cooperative game theory, to quantify the marginal contributions of climate and underlying surface factors to runoff output. The uncertainty and error range of the model are evaluated through five fold stratified cross validation, and the 95% confidence interval is directly calculated based on the statistical distribution of the validation samples to characterize the robustness of feature contributions. The dataset contains two table files stored in a universal CSV (comma separated values) format, which can be read using conventional spreadsheet software such as Excel, WPS, or programming languages. The first file (Figure 3. csv) records the spatiotemporal evolution data of runoff patterns in a small watershed, containing a total of 31 records, with row labels indicating specific hydrological station names (such as Huanghe River, Tangke, etc.); The column variables record the average flow rate, multi-year average values of low and high flow rates (measured in m ³/s), slope of the trend, and Boolean identification through significance testing (1 represents significant, 0 represents insignificant) for each watershed. The second file (Figure 4a - left. csv) records the contribution statistics of 19 climate and underlying surface characteristic factors under average flow conditions, including 19 data records with row labels of characteristic factor names (such as precipitation, potential evapotranspiration, forest proportion, etc.); The column variables cover the feature contribution strength index (mean-abs_SHAP) based on the SHAP absolute value over the years, as well as the upper and lower bounds of the 5% quantile (abs_SHAP_5quantile) and 95% quantile (abs_SHAP_95quantile) that reflect model uncertainty.
提供机构:
Science Data Bank
创建时间:
2026-04-24



