Code, data, and Raster and shape files used in the paramo soil carbon project
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://doi.org/10.7910/DVN/97RUDG
下载链接
链接失效反馈官方服务:
资源简介:
PÁRAMO SOC MODELING: REPRODUCIBLE WORKFLOW ========================================== Overview -------- This repository contains two R scripts to (1) fit and validate a spatially-aware Random Forest model for soil organic carbon (SOC) in Colombian paramos, and (2) generate national wall-to-wall SOC predictions and sector-level summaries. Scripts ------- 1) soilCmodel.R - Builds land-cover labels (Disturbed, Forest, Paramo). For modeling, the former "Nosoil" class is collapsed into Disturbed. - Extracts rasters to points and clusters points on a 100 m grid to avoid leakage across train/test folds. - Runs grouped v-fold spatial cross-validation, tunes RF by inner OOB RMSE, computes diagnostics (OOB, random 5-fold, spatial CV) in SOC space using Duan smearing for unbiased back-transform. - Saves the finalized model and artifacts for prediction and reporting. 2) soilCprediction.R - Loads the finalized model and the Duan smearing factor. - Assembles the predictor stack, predicts log-SOC, applies smearing, and outputs SOC density in Mg C ha^-1. Pixels flagged as Nosoil are set to 0. - Converts density to Mg per cell using true cell area in hectares. - Aggregates totals and statistics by paramo sector and land-cover class. - Produces figures and CSVs for the paper. Directory layout (edit paths in scripts if different) ----------------------------------------------------- geo_dir = .../Paramo carbon map/GEographic stats_dir = .../Paramo carbon map/stats2 Required inputs --------------- Points (CSV): - carbon_site.csv with columns: Longitude, Latitude, CarbonMgHa Predictor rasters (aligned to land-cover grid, ~100 m): - dem3_100.tif, TPI100.tif, slope100.tif - temp2.tiff (mean T), tempmax2.tiff, precip2.tiff, soilmoist2.tiff - Cobertura100.tif (grid target) Vectors: - corine_paramo2.* (CORINE polygons; fields include corinetext, Clasificac) - paramos.* (paramo sectors; field NOMBRE_COM) - paramos_names.csv (two columns: NOMBRE_COM, Sector) for short plot labels CRS expectations: - Input points in EPSG:4326 - Clustering for spatial CV uses EPSG:3116 (MAGNA-SIRGAS / Bogota) - Rasters are internally aligned to the Cobertura100.tif grid Software requirements --------------------- Tested with R >= 4.3 and packages: terra, sf, dplyr, tidyr, ranger, rsample, yardstick, vip, ggplot2, purrr, forcats, scales, stringr, bestNormalize (optional) Install once in R: install.packages(c( "terra","sf","dplyr","tidyr","ranger","rsample","yardstick","vip", "ggplot2","purrr","forcats","scales","stringr","bestNormalize" )) Each script starts with: suppressPackageStartupMessages({ library(terra); library(sf); library(dplyr); library(tidyr) library(ranger); library(rsample); library(yardstick); library(vip) library(ggplot2); library(purrr); library(forcats); library(scales); library(stringr) }) How to run ---------- 1) Fit + validate the model Rscript soilCmodel.R Outputs (in stats_dir): - rf_full.rds (finalized ranger model) - smear_full.txt (Duan smearing factor) - variable_importance.csv (permutation importance, mean and sd) - diagnostics.txt (OOB, random 5-fold, spatial CV metrics) - OVP_spatialCV.png (observed vs predicted, pooled folds) - imp_bar_RF.png (RF importance with error bars) 2) Predict wall-to-wall + summarize Rscript soilCprediction.R Outputs (in stats_dir): - SOC_pred_final_RF_GAM.tif (SOC density, Mg C ha^-1) - SOC_totals_by_sector.csv (Tg C by sector x land-cover) - SOC_by_sector_LC_Tg_mean_sd.csv (Tg C plus area-weighted mean/sd in Mg C ha^-1) - SOC_national_mean_sd_by_LC.csv (national area-weighted mean/sd in Mg C ha^-1) - sector_bars_TgC.png (stacked bars by sector using short labels) Units ----- - SOC density outputs are in Mg C ha^-1. - Totals are in Mg and reported as Tg (Mg / 1e6). - Cell areas are computed with terra::cellSize(..., unit="m")/10000 to ensure hectares. Modeling notes -------------- - Learner: ranger Random Forest, permutation importance, respect.unordered.factors="partition". - Response transform: log or Yeo-Johnson (when enabled), with Duan smearing to remove retransformation bias when returning to SOC space. - Spatial CV: grouped v-fold using 100 m clusters to prevent leakage. - Land cover: modeling uses three classes (Disturbed includes former Nosoil). In mapping, Nosoil pixels are forced to 0 SOC. Troubleshooting --------------- - If a write fails with "source and target filename cannot be the same", write to a new filename. - If sector labels appear misaligned in plots, normalize strings and join short names via paramos_names.csv. - If national means look ~100x too small, ensure means are area-weighted over valid pixels only (LC present AND SOC not NA), and that areas are in hectares. - If any join fails, confirm the sector name field (NOMBRE_COM) exists in paramos.shp and in paramos_names.csv. Reproducibility --------------- - set.seed(120) is used throughout. - All area computations are in hectares. - Scripts are deterministic given the same inputs and package versions.
创建时间:
2025-10-19



