five

Data from: "Cross-realm transferability of species distribution models – species characteristics matter more than modelling methods applied"

收藏
Mendeley Data2024-06-27 更新2024-06-28 收录
下载链接:
https://zenodo.org10302827
下载链接
链接失效反馈
官方服务:
资源简介:
Abstract This data contains occurrence observations (presence-absence) of 11 aquatic macrophytes from Bothnian Sea and Lake Puruvesi, and environmental covariates used to build species distribution models (SDMs) in paper "Cross-realm transferability of species distribution models – species characteristics matter more than modelling methods applied". At the moment the data and the code are supplied anonymously for double-blind peer review. In addition to data files, also R code for fitting the SDMs is supplied, as is the R code to replicate the analysis conducted in the paper. The data is stored in rdata format (point data), without coordinate information due to data policy restrictions. The species in the data are Isoëtes lacustris, Isoëtes echinospora, Ranunculus reptans, Ranunculus schmalhausenii, Potamogeton berchtoldii, Potamogeton perfoliatus, Potamogeton gramineus, Myriophyllum alterniflorum, Equisetum fluviatile, Eleocharis acicularis and Elodea canadensis. The environmental covariates are bottom water salinity, turbidity, sandy substrate occurrence, colored dissolved organic matter (CDOM), surface fetch, sampling depth, total nitrogen and total phosphorus, and distance to closest Phragmites australis reed bed. Objective of the study The modelling objective of the paper was species distribution model (SDM) transferability assesment. Transferability was assessed using models built in marine areas in projecting the distributions of the target species in Lake Puruvesi, Saimaa, Eastern Finland. Macrophyte mapping data from Lake Puruvesi was used as independent test data, against which transferability of the models was assessed. Location The species data was collected from two geographic areas: Bothnian Bay (Baltic Sea) and Lake Puruvesi (Eastern Finland). The marine observations from Bothnian Bay were split into three overlapping areas (areas 1-3), to test the effect of input data gradient length to SDM transferability. The largest marine sampling area (Area 3) ranged from 62.95, 65.91 latitude and 19.14, 27.99 longitude. Area 2 ranged from 63.95, 65.91 latitude and 21.53, 27.99 longitude. Area 1 ranged from 64.91, 65.91 latitude and 23.82, 27.99 longitude. The Hummonselkä subbasin of Lake Puruvesi, where the macrophyte test data was collected, is located at 61.89, 62.05 latitude and 29.58, 29.78 longitude. Species data The species observations were collected using diving transects placed in the floor of the sea or lake, and species observations were recorded in 2 m22 grid cells separated by 10 meters along the transect or 1 meters depth, depending which criteria was met first. The species data was collected in 2010 - 2020 from marine area, and 2017 from Lake Puruvesi. All macrophytes in 2 x 1 m frames were identified to species level by the diver, and the data contained information on species presence or absence in each grid cell. The diving transects were conducted using systematic survey protocol used in the underwater inventories of the Finnish Underwater Biodiversity Survey Program (VELMU). The locations of the diving transects were not randomly distributed, but were placed using expert judgement. As all our study species are macroscopic and relatively easily identifiable in the field (with the exception of possibility of mixing I. echinospora and I. lacustris), we consider the absences in our observation data to indicate true absences. That said, as the observation area is rather small (2 m2), it is possible that a species may be found in the site of investigation (e.g. a small lagoon) but be located outside the vegetation sampling grid. Environmental data Bottom water salinity The seasonal mean bottom water salinity was modeled using a generalized additive model with mean salinity as response, with log-link and gamma distribution for the errors. This was necessary to keep the resulting predictions positive. Bottom depth, CDOM, river influence and spatial location were used as predictors. Data from 448 locations were used and each location had a minimum of three observations. The model was validated using 30 % of the data left outside of the model fitting. The explained deviance of the model was 0.94 and the correlation between raw data and predicted values was 0.95 with few outliers. Turbidity Maps of turbidity (in FNU, Formazin Nephelometric Unit) were generated from Sentinel-2 Multi-Spectral Imager (MSI) observations using the Case-2 Regional Coast Colour (C2RCC) bio-optical inversion model, containing separate atmospheric correction and water quality parts. Before computing C2RCC, the original 10-meter input data was downsampled to 60 meters. The output variable of the C2RCC processor correlative to turbidity is the backscattering of total suspended sediments at 443 nm, which was further calibrated into turbidity (FNU) values using SYKE's empirical equations for coastal waters and clear lakes (for a similar approach, see Attila et al. (2013) and Sagerman, Hansen, and Wikström (2020)). Monthly observations of turbidity were aggregated into median composites to reduce the effects of cloud cover and other disturbances. Due to low solar elevation and ice cover in winter, the turbidity distribution maps are generated only for the summer months (May to September). The current processing covers years 2017 to 2021. An average raster layer was created from monthly observations as input for SDM building. Probability of sandy substrate Random forest model was used to classify sandy bottoms from Sentinel 2 MSI satellite images in shallow water areas. Identifying sandy substrate is based on the higher reflectance compared to other substrates. The model was trained and validated using diver recorded field observations in the Baltic area, and diver recorded and echo sounding observations in the freshwater area. For full coverage including areas beyond the shallow water, the satellite image classification was combined with boosted regression tree modelling result in the Baltic, and echo sounding based product in the freshwater region. The resulting layers were probabilities of sandy substrate with 10-meter cell resolution. CDOM We used different methods to estimate the CDOM levels in Bothnian Bay and Lake Puruvesi, based on biogeochemical model data and satellite images. For Lake Puruvesi, we applied the Finnish Environment Institute's (Syke) in-house CDOM algorithm to the Sentinel-2 MSI images processed by the C2RCC bio-optical processor (Brockmann et al. 2016). The observations in 10 m resolution were aggregated as monthly averages for each month of the summer season (May to October) from 2017 to 2021. For Bothnian Bay, we used Syke's in-house Sentinel-2 MSI CDOM layers (resolution: 60 m) aggregated as seasonal averages (1 Jul to 7 Sep). CDOM values are given as absorption coefficient of CDOM at 400 nm [m⁻¹]. Surface fetch A surface fetch raster was produced to the Puruvesi and Bothnian Bay. The analysis required a feature layer of shorelines from Puruvesi and Baltic Sea. First, we created polyline from north to south spanning over the whole area of interest with a gap of 20 meters which is also the resolution of the output raster. These lines were then cut each time they hit the shoreline and the part of the line that was overlapping land was removed. The distance of the remaining lines was then calculated and a point with the distance value was created every 20 meters. Each time the line was cut when hitting an island for example and starting again from the other side of the island, the distance calculation started from 0. This created a point dataset with a distance value in each point. We repeated the procedure for 15 times for different compass directions with 22.5 degree intervals and calculated average fetch for each point location on 20 meters grid from these 15 point layers. Depth Depth was measured by a diver using a dive computer while surveying each vegetation grid cell, and measured depth was used when projecting model results to Puruvesi (transferability performance). In addition, a depth model for the freshwater region was created from Sentinel 2 MSI satellite image using the logarithmic band ratio model of blue and red band. The model was calibrated using diver recorded field observations and validated against echo sounding measurements. For more complete coverage and to include deep areas, echo soundings from multiple sources were combined with the satellite derived bathymetry. The cell resolution of the resulting depth layer was 10 meters. Total nitrogen and phosphorus Mean total nitrogen and phosphorus layers for marine area were produced using ArcGIS "splines with barriers" tool for the EEZ of Finland with 20 meters spatial resolution (Virtanen et al. 2018). Summer (July - September) nutrient measurements from 0 to 10 meters depth between 2010 and 2020, obtained from the VESLA database, were used as input data for the interpolation. Nitrogen and phosphorus measurements in Puruvesi between 2010 and 2020 was gathered from the VESLA database. Data from July to September was selected to represent the growing season. A mean value of NTOT and PTOT was then calculated for each location. Spline with Barriers (SwB) tool was used to interpolate the values (Arcmap 10.7.1). The tool uses a feature layer as barrier to create the raster representing only the area of interest. For the barrier and the extent of the interpolated raster we used a shapefile representing Lake Puruvesi shoreline. The resolution was set to 5x5 meters. SwB tool created an "extent box" around the area of interest which was removed with Extract by Mask tool using the shoreline feature layer. After the interpolation we noticed that either one of the locations was situated on land or the polygon used as barrier was "leaking". SwB doesn´t interpolate areas that doesn't have locations with values or aren´t connected to the main body of water. To fix this, the raster was extended outwards based on the values of nearby cells and after that the raster was masked again to remove any cells on land. The phosphorus interpolation provided negative values in southern parts on Enanlahti in Kontiolahti and Muholanlahti. These negative values were caused by considerably larger phosphorus values in Enanlahti Lamminniemi (9m) Enanlahti Lamminniemi (4m) locations when compared with the nearby Puruvesi Enanlahti location. The interpolation apparently continued to decrease the values according to the trend set by the difference between these locations and caused it to reach negative values. The southern parts of the bay, about 750 meters, was removed and new values were calculated based on the surrounding cells with Focal Statistics tool. The interpolations were validated by removing 20 % of the locations and reproducing the interpolation. The removed locations and their values were then compared to the interpolated raster. R∗2∗2 value from phosphorus interpolation model was 0.91 after removing two outliers and R22 value from nitrogen interpolation model was 0.715 after removing one outlier. Distance to closest reed The aquatic vegetation (Phragmites australis reeds) presence/absence maps were also generated from Sentinel-2 MSI data. The processing included extracting one month of data (July 2019) from green and near-infra-red bands from Sentinel-2 Global Mosaic (S2GM) service and transforming those to normalized-difference vegetation indices (NDVIs). After that, Bayesian statistics were used to predict the posterior probability of vegetation occurrence when distance from shore and NDVI were used as predictor variables. The posterior variable was thresholded and the resulting vegetation presence areas were sieved so that both too small vegetation areas (fewer than 5 pixels) or areas that were not directly attached to shoreline were removed. The resulting map has 10 m pixel size and tentatively represents the locations of reed belts or other shoreline-attached vegetation. This EO-based layer could also be referred to as helophytes or helophytic macrophytes, as it denotes a specific zone of vegetation with emergent aquatic plants containing leaf-green, particularly those that grow densely and have horizontally oriented leaves. In some lakes, this layer can represent, for example, thick stands of Equisetum fluviatile, although in most cases, it is associated with common reed belts. The approach is described in more detail in Koponen et al. (2022). Data partitioning Data was partitioned with 70/30 splitting into training and test (interpolation accuracy) data. The splitting was repeated 100 times for each species by randomly selecting 70 % of observations which were used to build each of the SDMs (GLM, GAM, BRT and BART). The partitioning was repeated for each of the three input data areas and 11 species. The input data indexes for replicating the split are supplied in the data files. R code Code files contain scripts for fitting the SDM models described in the paper using the data. Also code for beta regression analyses for the analysis of the modelling results conducted in the paper, are supplied.
创建时间:
2023-12-14
二维码
社区交流群
二维码
科研交流群
商业服务