Data from: Interactions of wood accumulations, channel dynamics, and geomorphic heterogeneity within a river corridor

Mendeley Data2024-05-10 更新2024-06-28 收录

下载链接：

https://datadryad.org/stash/dataset/doi:10.5061/dryad.k0p2ngff3

下载链接

链接失效反馈

官方服务：

资源简介：

# Data from: Interactions of wood accumulations, channel dynamics, and geomorphic heterogeneity within a river corridor [https://doi.org/10.5061/dryad.k0p2ngff3](https://doi.org/10.5061/dryad.k0p2ngff3) All of our data was derived from publicly available data sources and is reproducible. We include the locations for data acquisition as well as scripts developed in Google Earth Engine for imagery acquisition/analyses and in R for calculation of spatial heterogeneity metrics. ## Description of the data and file structure This dataset contains a .xlsx file with the raw numerical data. There are three tabs within the .xlsx file. Tab 1 "2013-2022" contains the data for each of our multi-year river corridor variables and each column represents a different variable. "Site" corresponds with which segment the data is associated with and does not have units; "Year" corresponds with which imagery year data is from and the units are year; "Wood\_Dis\_Dens" corresponds with the distribution density of wood accumulations and the number of wood accumulations/river corridor area; "Wood\_Count" corresponds with the total number of wood accumulations and the units are number of logjams/segment; "Total\_Sinuosity" corresponds with total sinuosity and the units are meters of active channel length/meters of valley length; "Migration" corresponds with average channel migration and the units are square meters of migration area/meters of migration length; and "Peak\_Discharge" corresponds with the annual peak discharge and the units are cubic meters per seconds. Tab 2 "Integrated time" contains the data that is not separated by year and only includes data for 2022. Once again, each column represents a different variable. In addition to the variables mentioned above, this dataset also includes the variables "beaver\_meadow","sticky\_sites", "aggregation", "interspersion", "patch\_density", and "evenness". "beaver\_meadow" corresponds with the proportion of beaver meadow area (sq. meters) to floodplain area (sq. meters); "sticky\_sites" corresponds with the number of wood accumulations that persisted throughout the years of this study (count); "aggregation", "interspersion", "patch\_density", and "evenness" are all spatial heterogeneity metrics and are measured as a % with the exception of patch density, which is measured as (patches/100 ha). Tab 3 "Site Comparisons" contains data from 14 additional rivers in the U.S in addition to our site. None of the variables in this tab are new, but the column "Site" contains the names of the rivers used for comparison of heterogeneity metrics (aggregation, interspersion, patch density, and evenness). A detailed description of how variables represented in each column were collected is included below. For clarity, we have structured the description of data and data acquisition information by sections that correspond with the methods of the companion manuscript - Marshall et al. (2023). Data for wood accumulations and beaver modifications * This data was collected manually using Google Earth imagery. We conducted manual aerial wood accumulation surveys using available Google Earth imagery between 2013 and 2022 (four years of available imagery: 2013, 2016, 2020, 2022). We mapped all logjams that could be detected via the aerial imagery. Wood accumulations that were under canopy, too small for the spatial resolution of imagery, not interacting with base flows, or containing less than three visible wood pieces were not included. We recorded the number of wood accumulations per 2-km segment for each available imagery year as a minimum wood-accumulations count and divided the wood count by floodplain area for each segment to get the wood distribution density. We also noted the occurrence of persistent wood accumulations that were continually present in the Google Earth imagery, in what we refer to as “sticky sites”. Data for channel dynamism and annual peak discharge * To collect average channel migration data, we used a combination of Google Earth Engine and ArcGIS Pro. To measure active channel migration, we developed a semi-automated approach to map surface water extent and planimetric centerline movement (code included). Surface water extent was delineated for 2013, 2016, 2020, and 2022 to keep the timestep consistent with our wood surveys. We used publicly-available satellite imagery. Imagery collected for the National Agriculture Imagery Program (NAIP) was used when available (2013 and 2016). For 2020 and 2022, cloud-free multispectral composite images were created in Google Earth Engine from Sentinel-2 imagery from average baseflow months (August-October). Surface water was classified using the normalized difference water index (NDWI) for NAIP imagery, and modified normalized difference water index (MNDWI) in Sentinel-2 imagery. A unique threshold was empirically determined for each year to optimize the identification of the river surface while minimizing false-positive water identification, resulting in binary water and non-water masks for each year. Gaps and voids in the Sentinel-2 derived water masks (from shadow-covered areas, thin river segments, or mixed pixels along the river edge) were filled by sequentially buffering the water areas outwards by 30 meters (three pixels) and then inwards by 15 m. Similarly, gaps and voids in NAIP-derived water masks were filled using a sequential 20 m outwards then inwards buffer. The resulting binary water masks were imported into ArcGIS Pro (academic license) and vectorized. Manual adjustments were made to remove any remaining misclassified areas and join disconnected segments. We delineated centerlines of our channel masks in using the ArcGIS Pro Polygon to Centerline tool. When multiple channels were present, the dominant channel branch was chosen for the channel centerline. Consequently, our analysis represents a minimum value of channel migration during each time step because it does not include secondary channel movements. The Feature to Polygon tool was used to extract area differences between two centerlines at each segment. Areas between the centerlines for each segment were divided by centerline length to get a horizontal change distance. * We measured total sinuosity manually in each 2-km segment for 2013, 2016, 2020, 2022 using Google Earth imagery and the built-in Measure tool in Google Earth (https://earth.google.com/web/). We measured total sinuosity as the ratio of total channel length of all active channels/valley length. * We obtained annual peak discharge from the nearest US Geological Survey gauge (12370000, Swan River near Bigfork, MT). Data for geomorphic heterogeneity * Data included in the geomorphic heterogeneity remote sensing analysis included: Sentinel-2 imagery mosaic prepared in Google Earth Engine (code included), normalized difference vegetation index (NDVI) and normalized difference moisture index (NDMI) rasters calculated from the Sentinel-2 mosaic in ArcGIS Pro. The Sentinel mosaic was prepared for the approximate growing season in Montana, USA, (June 1 to October 31) based on available annual phenology activity curves (2018-2022) from the US National Phenology Network of the existence of leaves or needles on flowering plants. * We performed an unsupervised remote sensing classification on a stack of data containing a 2022 Sentinel-2 imagery mosaic prepared in GEE, and normalized difference vegetation index (NDVI) and normalized difference moisture index (NDMI) rasters calculated from the Sentinel-2 mosaic in ArcGIS Pro. The *ISO Cluster Unsupervised Classification* ArcGIS Pro tool was used to perform the classification. Inputs to the tool were a maximum of 10 classes, a minimum class size of 20 pixels (tool default), and a sample interval of 10 pixels (tool default). The entire reach was classified once, and then clipped into individual 2-km segments. Statistical analyses * Statistical analyses were conducted in R (open source). We used an alpha (probability of rejecting the null hypothesis when the null hypothesis is true) of 0.05 in all statistical analyses. * To understand the influence of time on our river corridor variables, we examined whether there was significant variation in the medians of river corridor variables between timesteps using a Kruskal-Wallis Rank Sum test. For any variable where there was a significant change between timesteps, we used a Dunn Test to determine exactly which groups were different. We also conducted the same exploratory statistical analysis to understand whether there was any significant variation in medians for each variable between segments. * To understand the predictors of wood distribution density and wood count (hypothesis *i*), we ran a multiple regression model with total sinuosity, channel migration, and peak annual discharge as predictor variables. We performed a stepwise model selection with the stepAIC function from the *MASS* package to provide a relative indication of the quality of statistical model for our given set of data. * To address hypotheses *i* and *ii*, we calculated both Pearson (*r)* and Kendall (τ) correlation coefficients. Given our small sample, we report both *r* and τ values. All correlation coefficients were calculated using the *cor.test()* function in base R. Hypothesis *iii* was further addressed through multiple linear regression models and stepwise model selection to determine whether proportion of beaver meadows, sticky sites, total sinuosity, channel migration, wood distribution density, or wood count are clear predictors of spatial heterogeneity metrics. We only used data from 2022 to keep a consistent timestep across all variables. Response variables include four spatial heterogeneity metrics: aggregation, interspersion, density, and evenness. All calculated river corridor metrics are included as an .xlsx file. ## Sharing/Access information Publicly-available data used in this study can be found in the following locations: NAIP and Sentinel-2 imagery - Google Earth Engine Code (see links in the Code/Software section) Google Earth imagery- https://earth.google.com/web/ USGS gage data- https://waterdata.usgs.gov/monitoring-location/12370000/#parameterCode=00065&period=P7D&showMedian=true ## Code/Software Software Used Google Earth Engine: https://code.earthengine.google.com/ R (version 4.2.3): open-source and available for download: https://www.r-project.org/ ArcGIS Pro (version 3.1): academic license needed Google Earth Engine Code: Channel migration and surface water mapping: [https://code.earthengine.google.com/572a187e196bcd3d32fdb6b7d38003f9](https://code.earthengine.google.com/572a187e196bcd3d32fdb6b7d38003f9) Geomorphic heterogeneity imagery acquisition: [https://code.earthengine.google.com/fa6632e24a67a5ba8c9d6e44f92016be](https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcode.earthengine.google.com%2Ffa6632e24a67a5ba8c9d6e44f92016be&data=05%7C01%7CAnna.E.Marshall%40colostate.edu%7C345185d940fd49b4c9e208dbcf30306e%7Cafb58802ff7a4bb1ab21367ff2ecfc8b%7C0%7C0%7C638331578104741784%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=DVv0BvYPOEHmkQ%2BBmOZSUUEBUha3JEDzUlrZ%2Fuwq5Co%3D&reserved=0) R Code used to calculate heterogeneity metrics: uploaded as Swan\_River\_Analysis\_landscapemetrics.R

# 数据集来源：河道廊道内木质堆积物、河道动力过程与地貌异质性的相互作用 [https://doi.org/10.5061/dryad.k0p2ngff3] 本数据集所有数据均来自公开数据源，可复现。本数据集包含数据采集点位信息，以及用于影像获取与分析的谷歌地球引擎（Google Earth Engine, GEE）脚本，和用于计算空间异质性指标的R脚本。 ## 数据与文件结构说明本数据集包含存储原始数值数据的.xlsx文件，该文件内包含三个工作表。工作表1「2013-2022」包含本研究多年河道廊道变量的原始数据，每一列对应一个不同的变量。各字段说明如下： - 点位（Site）：关联数据所属的河段，无单位； - 年份（Year）：影像数据对应的采集年份，单位为年； - 木质堆积物分布密度（Wood_Dis_Dens）：木质堆积物的分布密度，单位为木质堆积物数量/河道廊道面积； - 木质堆积物总数（Wood_Count）：河段内木质堆积物总数量，单位为堆积段数/河段； - 总弯曲度（Total_Sinuosity）：河道总弯曲度，单位为活动河道总长度/河谷长度； - 河道迁移量（Migration）：平均河道迁移量，单位为迁移面积平方米/迁移长度米； - 年峰值流量（Peak_Discharge）：年度峰值流量，单位为立方米每秒。工作表2「整合时段」包含未按年份拆分的2022年专项数据，每一列对应一个变量。除上述变量外，本数据集还包含以下字段： - 河狸草甸占比（beaver_meadow）：河狸草甸面积（平方米）与泛滥平原面积（平方米）的比值； - 持久堆积点位（sticky_sites）：本研究周期内持续存在的木质堆积物数量，单位为计数； - 聚集度（aggregation）、散布度（interspersion）、斑块密度（patch_density）和均匀度（evenness）：均为空间异质性指标，除斑块密度外其余指标单位为百分比；斑块密度单位为斑块数/100公顷。工作表3「点位对比」包含本研究主河段之外的美国境内14条额外河流的对比数据。该工作表内无新增变量，其中「Site」列标注用于异质性指标（聚集度、散布度、斑块密度和均匀度）对比的河流名称。各字段的详细采集说明见下文。为便于理解，我们按照配套论文Marshall等人（2023）的研究方法章节结构，整理了数据及数据采集的相关说明。 ### 木质堆积物与河狸改造活动数据本数据通过谷歌地球（Google Earth）影像人工采集完成。我们基于2013-2022年可获取的谷歌地球影像（共4个年份：2013、2016、2020、2022）开展人工航空木质堆积物普查，对所有可通过航空影像识别的倒木堆积（logjams）进行矢量化标注。以下情况的木质堆积物不予纳入：位于树冠遮蔽下、因影像空间分辨率限制无法识别、未与基流相互作用，或可见木质构件少于3件的堆积体。我们针对每个可用影像年份，记录每2公里河段内的木质堆积物数量作为最低计数，并将该数量除以对应河段的泛滥平原面积，得到木质堆积物分布密度。同时，我们记录了在谷歌地球影像中持续存在的木质堆积物点位，即本研究所称的「持久堆积点位（sticky sites）」。 ### 河道动力过程与年峰值流量数据为获取平均河道迁移数据，我们结合使用了谷歌地球引擎（Google Earth Engine, GEE）与ArcGIS Pro软件。为量化活动河道迁移量，我们开发了一套半自动化流程，用于提取地表水范围与平面中心线位移（配套代码已提供）。我们对2013、2016、2020、2022四个年份的地表水范围进行矢量化，以确保与木质堆积物普查的时间步长一致。本研究使用公开卫星影像：2013和2016年采用美国国家农业影像计划（National Agriculture Imagery Program, NAIP）的公开影像；2020和2022年则基于Sentinel-2影像，在谷歌地球引擎中合成基流平均月份（8-10月）的无云多光谱影像。针对NAIP影像，我们通过归一化差异水体指数（Normalized Difference Water Index, NDWI）进行地表水分类；针对Sentinel-2影像，则采用改进型归一化差异水体指数（Modified NDWI, MNDWI）。我们针对每个年份通过实验确定专属阈值，以优化河道水面识别精度并最小化误识别概率，最终生成各年份的二值化水体/非水体掩膜。对于Sentinel-2影像生成的水体掩膜中的空隙（由阴影覆盖区域、窄河段或河道边缘混合像元导致），我们通过依次向外缓冲30米（3个像素）再向内缓冲15米的方式进行填充；针对NAIP影像的水体掩膜空隙，则采用依次向外缓冲20米再向内缓冲的流程完成填充。生成的二值化水体掩膜被导入ArcGIS Pro（学术授权版）并进行矢量化，随后通过人工修正去除剩余的分类错误区域，并连接断开的矢量线段。我们使用ArcGIS Pro的「多边形转中心线」工具提取河道掩膜的中心线，若存在多条河道，则选取主河道分支作为该河段的中心线。因此，本研究的河道迁移分析结果为各时间步长内的最小值，未包含次要河道的迁移活动。我们使用「要素转多边形」工具提取每个河段两条中心线之间的面积差，再将该面积除以中心线长度，得到该河段的水平位移距离。 * 我们基于谷歌地球（https://earth.google.com/web/）及其内置测量工具，对2013、2016、2020、2022四个年份的每一段2公里河段开展人工总弯曲度测量。总弯曲度定义为所有活动河道总长度与河谷长度的比值。 * 我们从最近的美国地质调查局（United States Geological Survey, USGS）水文站（编号12370000，蒙大拿州比弗福克附近的斯万河）获取年峰值流量数据。 ### 地貌异质性数据本研究的地貌异质性遥感分析所用数据包括：在谷歌地球引擎中合成的Sentinel-2影像镶嵌图（配套代码已提供），以及在ArcGIS Pro中基于该Sentinel-2镶嵌图计算得到的归一化植被指数（Normalized Difference Vegetation Index, NDVI）和归一化水分指数（Normalized Difference Moisture Index, NDMI）栅格数据。Sentinel-2镶嵌图的合成时段选取美国蒙大拿州的近似生长季（6月1日至10月31日），该时段选择依据美国国家物候网络（US National Phenology Network）2018-2022年的年度物候活动曲线，以确保影像覆盖开花植物展叶或抽针的时期。 * 我们针对包含2022年谷歌地球引擎合成的Sentinel-2影像镶嵌图、以及ArcGIS Pro计算得到的NDVI和NDMI栅格的多波段数据集开展无监督遥感分类。我们使用ArcGIS Pro的「ISO聚类无监督分类」工具完成分类，分类参数设置为：最大类别数10、最小类别尺寸20像素（工具默认值）、采样间隔10像素（工具默认值）。首先对整个研究河段进行整体分类，随后将分类结果裁剪为独立的2公里河段。 ### 统计分析 * 所有统计分析均在开源软件R中完成。本研究所有统计分析均采用0.05的显著性水平（alpha值，即当原假设为真时拒绝原假设的概率）。 * 为探究时间对河道廊道变量的影响，我们采用克鲁斯卡尔-沃利斯秩和检验（Kruskal-Wallis Rank Sum test）分析各时间步长间河道廊道变量中位数是否存在显著差异。若某变量在时间步长间存在显著变化，则通过邓恩检验（Dunn Test）确定具体存在差异的组别。我们还采用相同的探索性统计分析方法，探究不同河段间各变量的中位数是否存在显著差异。 * 为解析木质堆积物分布密度与堆积总数的影响因子（假设i），我们以总弯曲度、河道迁移量和年峰值流量作为预测变量，构建多元回归模型。我们通过MASS包中的stepAIC函数开展逐步模型选择，以评估本数据集对应的统计模型质量。 * 为验证假设i与假设ii，我们分别计算了皮尔逊（Pearson）相关系数（r）与肯德尔（Kendall）τ相关系数。鉴于本研究样本量较小，我们同时报告两种相关系数的计算结果，所有相关系数均通过R基础包中的cor.test()函数计算得到。为验证假设iii，我们进一步构建多元线性回归模型并开展逐步模型选择，以解析河狸草甸占比、持久堆积点位、总弯曲度、河道迁移量、木质堆积物分布密度与堆积总数是否为空间异质性指标的显著预测因子。为确保所有变量的时间步长一致，我们仅采用2022年的数据开展分析。响应变量包含4项空间异质性指标：聚集度、散布度、斑块密度与均匀度。所有计算得到的河道廊道指标均存储于.xlsx文件中。 ## 数据共享与获取信息本研究使用的公开数据源可通过以下途径获取： - NAIP与Sentinel-2影像：谷歌地球引擎代码（详见「代码与软件」章节的链接）； - 谷歌地球影像：https://earth.google.com/web/； - USGS水文站数据：https://waterdata.usgs.gov/monitoring-location/12370000/#parameterCode=00065&period=P7D&showMedian=true ## 代码与软件 ### 所用软件与代码 1. 谷歌地球引擎（Google Earth Engine）：https://code.earthengine.google.com/ 2. R（版本4.2.3）：开源软件，可从https://www.r-project.org/下载 3. ArcGIS Pro（版本3.1）：需学术授权配套代码如下： - 河道迁移与地表水提取代码：https://code.earthengine.google.com/572a187e196bcd3d32fdb6b7d38003f9 - 地貌异质性影像获取代码：https://code.earthengine.google.com/fa6632e24a67a5ba8c9d6e44f92016be - 异质性指标计算R代码：已上传为Swan_River_Analysis_landscapemetrics.R

创建时间：

2024-05-08