Reproducibility package: PCA–SOM–KMeans clustering of LiDAR-derived structural predictors

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://data.mendeley.com/datasets/26pw833vxb

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset supports the study “Unsupervised Machine Learning of LiDAR-Derived Forest Structure to Support Ecosystem Monitoring in Minnesota’s Kawishiwi Ranger District” (Salehnia et al., 2026; under review at Ecological Informatics). The working hypothesis of the study is that LiDAR-derived canopy structure metrics, summarized over a regular grid, can be used to objectively delineate spatially coherent forest structural types (clusters) that are meaningful for ecosystem monitoring and interpretation of forest heterogeneity. The dataset includes (i) Python code to derive structural predictors from a Canopy Height Model (CHM) raster and (ii) code to perform an unsupervised learning workflow consisting of PCA-based dimensionality reduction, Self-Organizing Map (SOM) training, and K-means clustering. The main data file (predictors_with_latlon.csv) contains gridded observations with Latitude/Longitude (EPSG:4326) and corresponding LiDAR/CHM structural predictors that quantify canopy height distribution, variability, and vertical stratification (e.g., percentiles, variability metrics, and stratum-based measures). These predictors were computed from CHM values after applying a simple quality-control filter to reduce common artifacts (thresholds and justification are documented in the scripts and in the manuscript Methods). The workflow produces diagnostic outputs (PCA scree/variance summaries and loading tables, SOM grid evaluation metrics and U-matrix visualization, and K-means cluster selection metrics) and assigns each grid cell/point to a structural cluster. The resulting cluster patterns can be interpreted as distinct forest structural regimes across the study area (e.g., differences in canopy height, structural complexity, and strata occupancy), providing a reproducible basis for mapping and monitoring forest structure. Methodological details and parameter settings are described in Salehnia et al. (2026), Sections 2.2–2.5. Users can apply the scripts to other CHM/LiDAR products by updating file paths and adapting the CHM quality-control thresholds to match local canopy height ranges and data characteristics. Intended use: The code and example data can be reused to reproduce the results of the study and/or to apply the same PCA–SOM–K-means clustering approach to other LiDAR/CHM-derived structural datasets.

创建时间：

2026-01-22

5,000+

优质数据集

54 个

任务类型

进入经典数据集