Data and code used in: "Effects of Random Forest modeling decisions on biogeochemical time series predictions"
收藏DataONE2023-04-07 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/ess-dive-7e73979f4993834-20230407T144756424156
下载链接
链接失效反馈官方服务:
资源简介:
Note: All data and code used to reproduce this analysis are located in the Zenodo DOI and described in README_rf_synthesis.pdf. This data package is associated with "Effects of random forest modeling decisions on biogeochemical time series predictions": https://doi.org/10.1002/lom3.10523 (Regier et al 2022). We explored the role that parameter decisions, including training/testing data splitting strategies, variable selection, and hyperparameters play on Random Forest goodness-of-fit by constructing models using 1296 unique parameter combinations to predict concentrations of nitrate, a key nutrient for biogeochemical cycling in aquatic ecosystems. This dataset includes data from the publicly available National Estuarine Research Reserve (NERR) data portal. Data used for modeling is stored in the ‘data’ folder (rf-synthesis-LOM/data/) and organized by site (‘cbv’ or ‘owc’). Within site directories, data are further organized by type (‘meteorology’, ‘nutrients’, and ‘water_quality’). Within these sub-directories, individual .csv files are labeled by the 3-letter site abbreviation followed by a 2-letter station abbreviation followed by a 2 or 3- letter data type abbreviation followed by a 4-digit year. For detailed information on how these data were collected, and quality-controlled, as well as definitions for column names, please refer to NERR documentation (https://cdmo.baruch.sc.edu/).
创建时间:
2023-04-07



